[jira] [Updated] (PIG-2909) Add a new option for ignoring corrupted files to AvroStorage load func

2012-09-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2909:
---

Attachment: PIG-2909-2.patch

 Add a new option for ignoring corrupted files to AvroStorage load func
 --

 Key: PIG-2909
 URL: https://issues.apache.org/jira/browse/PIG-2909
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Attachments: PIG-2909-2.patch, PIG-2909-avro_test_files.tar.gz, 
 PIG-2909.patch


 Currently, AvroStorage load fails with AvroRuntimeException when encountering 
 corrupted input files. For example,
 {code}
 ERROR 2997: Unable to recreate exception from backed error: 
 java.io.IOException: org.apache.avro.AvroRuntimeException: 
 java.io.IOException: Invalid sync!
   at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283)
 {code}
 But it is not always desirable to fail the Pig job for bad files. It is 
 sometimes more useful to skip them and continue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2637) Command-line option -e throws TokenMgrError exception

2012-09-13 Thread fang fang chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fang fang chen reassigned PIG-2637:
---

Assignee: fang fang chen  (was: Richard Ding)

 Command-line option -e throws TokenMgrError exception
 -

 Key: PIG-2637
 URL: https://issues.apache.org/jira/browse/PIG-2637
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.2
Reporter: Richard Ding
Assignee: fang fang chen
Priority: Minor

 The command-line:
 {code}
 java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt';
 {code}
 fails with exception:
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 1, column 18.  
 Encountered: EOF after : 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception

2012-09-13 Thread fang fang chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fang fang chen updated PIG-2637:


Attachment: PIG-2637.patch

 Command-line option -e throws TokenMgrError exception
 -

 Key: PIG-2637
 URL: https://issues.apache.org/jira/browse/PIG-2637
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.2
Reporter: Richard Ding
Assignee: fang fang chen
Priority: Minor
 Attachments: PIG-2637.patch


 The command-line:
 {code}
 java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt';
 {code}
 fails with exception:
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 1, column 18.  
 Encountered: EOF after : 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2637) Command-line option -e throws TokenMgrError exception

2012-09-13 Thread fang fang chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454736#comment-13454736
 ] 

fang fang chen commented on PIG-2637:
-

Attach the patch file. Test passed in my environment.

 Command-line option -e throws TokenMgrError exception
 -

 Key: PIG-2637
 URL: https://issues.apache.org/jira/browse/PIG-2637
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.2
Reporter: Richard Ding
Assignee: fang fang chen
Priority: Minor
 Attachments: PIG-2637.patch


 The command-line:
 {code}
 java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt';
 {code}
 fails with exception:
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 1, column 18.  
 Encountered: EOF after : 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK

2012-09-13 Thread fang fang chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fang fang chen updated PIG-2405:


Attachment: 2405_2.patch
2405_1.patch

Current all UT passed.
2405_1.patch is for TestDataModel, TestNewPlanLogToPhyTranslationVisitor, 
TestMRCompiler  
2405_2.patch is for TestPruneColumn


 svn tags/release-0.9.1: some unit test case failed with open JDK
 

 Key: PIG-2405
 URL: https://issues.apache.org/jira/browse/PIG-2405
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.1
 Environment: ant-1.8.2
 open jdk: 1.6
Reporter: fang fang chen
Assignee: fang fang chen
 Attachments: 2405_1.patch, 2405_2.patch


 [junit] Test org.apache.pig.test.TestDataModel FAILED
 Testcase: testTupleToString took 0.004 sec
 FAILED
 toString expected:...ad a little 
 lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a 
 little lamb)},[[goodbye#all,hello#world]],42,50,3.14...
 junit.framework.ComparisonFailure: toString expected:...ad a little 
 lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a 
 little lamb)},[[goodbye#all,hello#world]],42,50,3.14...
  at 
 org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269
 [junit] Test org.apache.pig.test.TestHBaseStorage FAILED
 Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec
 Testcase: testHeterogeneousScans took 0.018 sec
 Caused an ERROR
 java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many 
 open files)
 java.lang.RuntimeException: java.io.FileNotFoundException: 
 /root/pigtest/conf/hadoop-site.xml (Too many open files)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162)
 at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035)
 at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)
 at org.apache.hadoop.conf.Configuration.get(Configuration.java:436)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:271)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:130)
 at 
 org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809)
 at 
 org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741)
 Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml 
 (Too many open files)
 at java.io.FileInputStream.init(FileInputStream.java:112)
 at java.io.FileInputStream.init(FileInputStream.java:72)
 at 
 sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
 at 
 sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
 at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
 Source)
 at 
 org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079)
 Caused an ERROR
 Could not resolve the DNS name of hostname:39611
 java.lang.IllegalArgumentException: Could not resolve the DNS name of 
 hostname:39611
 at 
 org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
 at 
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:66)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145)
 at 
 org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120)
 at 
 

[jira] [Commented] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK

2012-09-13 Thread fang fang chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454742#comment-13454742
 ] 

fang fang chen commented on PIG-2405:
-

Patch files are based on pig-trunk.

 svn tags/release-0.9.1: some unit test case failed with open JDK
 

 Key: PIG-2405
 URL: https://issues.apache.org/jira/browse/PIG-2405
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.1
 Environment: ant-1.8.2
 open jdk: 1.6
Reporter: fang fang chen
Assignee: fang fang chen
 Attachments: 2405_1.patch, 2405_2.patch


 [junit] Test org.apache.pig.test.TestDataModel FAILED
 Testcase: testTupleToString took 0.004 sec
 FAILED
 toString expected:...ad a little 
 lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a 
 little lamb)},[[goodbye#all,hello#world]],42,50,3.14...
 junit.framework.ComparisonFailure: toString expected:...ad a little 
 lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a 
 little lamb)},[[goodbye#all,hello#world]],42,50,3.14...
  at 
 org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269
 [junit] Test org.apache.pig.test.TestHBaseStorage FAILED
 Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec
 Testcase: testHeterogeneousScans took 0.018 sec
 Caused an ERROR
 java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many 
 open files)
 java.lang.RuntimeException: java.io.FileNotFoundException: 
 /root/pigtest/conf/hadoop-site.xml (Too many open files)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162)
 at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035)
 at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)
 at org.apache.hadoop.conf.Configuration.get(Configuration.java:436)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:271)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:130)
 at 
 org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809)
 at 
 org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741)
 Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml 
 (Too many open files)
 at java.io.FileInputStream.init(FileInputStream.java:112)
 at java.io.FileInputStream.init(FileInputStream.java:72)
 at 
 sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
 at 
 sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
 at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
 Source)
 at 
 org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079)
 Caused an ERROR
 Could not resolve the DNS name of hostname:39611
 java.lang.IllegalArgumentException: Could not resolve the DNS name of 
 hostname:39611
 at 
 org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
 at 
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:66)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145)
 at 
 org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120)
 at 
 org.apache.pig.test.TestHBaseStorage.tearDown(TestHBaseStorage.java:112)
 [junit] Test org.apache.pig.test.TestMRCompiler FAILED
 Testcase: 

[jira] [Commented] (PIG-2637) Command-line option -e throws TokenMgrError exception

2012-09-13 Thread fang fang chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454743#comment-13454743
 ] 

fang fang chen commented on PIG-2637:
-

Patch files are based on pig-trunk.

 Command-line option -e throws TokenMgrError exception
 -

 Key: PIG-2637
 URL: https://issues.apache.org/jira/browse/PIG-2637
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.2
Reporter: Richard Ding
Assignee: fang fang chen
Priority: Minor
 Attachments: PIG-2637.patch


 The command-line:
 {code}
 java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt';
 {code}
 fails with exception:
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 1, column 18.  
 Encountered: EOF after : 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2744) Handle Pig command line with XML special characters

2012-09-13 Thread fang fang chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454745#comment-13454745
 ] 

fang fang chen commented on PIG-2744:
-

Patch files are based on pig-trunk.

 Handle Pig command line with XML special characters
 ---

 Key: PIG-2744
 URL: https://issues.apache.org/jira/browse/PIG-2744
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Richard Ding
Assignee: fang fang chen
 Attachments: PIG-2744.patch


 Pig stores Pig command line string to the Hadoop job XML file. It will fail 
 if the command line string contains XML special characters. Pig should treat 
 the command string like Pig script by first encoding it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Linking bin/pig to some PATH dir

2012-09-13 Thread Jonas Grote
Hello,

in bin/pig there is a line that says resolve links - $0 may be a
softlink, yet it does not seem to resolve the link for the script
itself.
I'm not sure what line 71 intends to do, however the attached patch
should allow users to link bin/pig to their preferred PATH directory
and still make pig run as usual.

Best regards

Jonas


[jira] [Commented] (PIG-1748) Add load/store function AvroStorage for avro data

2012-09-13 Thread deb ashish (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454762#comment-13454762
 ] 

deb ashish commented on PIG-1748:
-

REGISTER /path/avro-1.4.1.jar
REGISTER /path/json-simple-1.1.jar
REGISTER /path/piggybank.jar
REGISTER /path/jackson-core-asl-1.5.5.jar
REGISTER /path/jackson-mapper-asl-1.5.5.jar
avro = LOAD '/hdfs path/part-r-0.avro' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();

Im trying this code but it's unable to read the avro file,showing  the 
following exception


Pig Stack Trace
---
ERROR 2997: Unable to recreate exception from backed error: Error: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias sc. Backend error : Unable to recreate exception from backed 
error: Error: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but 
interface was expected
at org.apache.pig.PigServer.openIterator(PigServer.java:742)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:406)
at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: 
Unable to recreate exception from backed error: Error: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
at org.apache.pig.PigServer.storeEx(PigServer.java:874)
at org.apache.pig.PigServer.store(PigServer.java:816)
at org.apache.pig.PigServer.openIterator(PigServer.java:728)
... 7 more

please help me asap



 Add load/store function AvroStorage for avro data
 -

 Key: PIG-1748
 URL: https://issues.apache.org/jira/browse/PIG-1748
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: lin guo
Assignee: lin guo
 Fix For: 0.9.0

 Attachments: avro_storage.patch, AvroStorageUtils-bagfix.patch, 
 avro_test_files.tar.gz, PIG-1748-2.patch, PIG-1748-3.patch


 We want to use Pig to process arbitrary Avro data and store results as Avro 
 files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
 Due to discrepancies of Avro and Pig data models, AvroStorage has:
 1. Limited support for record: we do not support recursively defined record 
 because the number of fields in such records is data dependent.
 2. Limited support for union: we only accept nullable union like [null, 
 some-type].
 For simplicity, we also make the following assumptions:
 If the input directory is a leaf directory, then we assume Avro data files in 
 it have the same schema;
 If the input directory contains sub-directories, then we assume Avro data 
 files in all sub-directories have the same schema.
 AvroStorage takes no input parameters when used as a LoadFunc (except for 
 debug [debug-level]). 
 Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
 don't, Avro schema of output data is derived from its 
 Pig schema.
 Detailed documentation can be found in 
 http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2919) Linking bin/pig to some PATH dir

2012-09-13 Thread Jonas Grote (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonas Grote updated PIG-2919:
-

Status: Patch Available  (was: Open)

 Linking bin/pig to some PATH dir
 

 Key: PIG-2919
 URL: https://issues.apache.org/jira/browse/PIG-2919
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Jonas Grote
Priority: Minor
 Attachments: pig-link.patch


 In bin/pig there is a line that says resolve links - $0 may be a
 softlink, yet it does not seem to resolve the link for the script
 itself.
 I'm not sure what line 71 intends to do, however the attached patch
 should allow users to link bin/pig to their preferred PATH directory
 and still make pig run as usual.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1748) Add load/store function AvroStorage for avro data

2012-09-13 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454985#comment-13454985
 ] 

Jakob Homan commented on PIG-1748:
--

@deb - questions like these should be directed to the pig user list, not JIRA.  
You'll receive assistance there.

 Add load/store function AvroStorage for avro data
 -

 Key: PIG-1748
 URL: https://issues.apache.org/jira/browse/PIG-1748
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: lin guo
Assignee: lin guo
 Fix For: 0.9.0

 Attachments: avro_storage.patch, AvroStorageUtils-bagfix.patch, 
 avro_test_files.tar.gz, PIG-1748-2.patch, PIG-1748-3.patch


 We want to use Pig to process arbitrary Avro data and store results as Avro 
 files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
 Due to discrepancies of Avro and Pig data models, AvroStorage has:
 1. Limited support for record: we do not support recursively defined record 
 because the number of fields in such records is data dependent.
 2. Limited support for union: we only accept nullable union like [null, 
 some-type].
 For simplicity, we also make the following assumptions:
 If the input directory is a leaf directory, then we assume Avro data files in 
 it have the same schema;
 If the input directory contains sub-directories, then we assume Avro data 
 files in all sub-directories have the same schema.
 AvroStorage takes no input parameters when used as a LoadFunc (except for 
 debug [debug-level]). 
 Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
 don't, Avro schema of output data is derived from its 
 Pig schema.
 Detailed documentation can be found in 
 http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2909) Add a new option for ignoring corrupted files to AvroStorage load func

2012-09-13 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455082#comment-13455082
 ] 

Cheolsoo Park commented on PIG-2909:


I updated my patch. I also added cleanupOnSuccess() to AvroStorage for PIG-1891.

 Add a new option for ignoring corrupted files to AvroStorage load func
 --

 Key: PIG-2909
 URL: https://issues.apache.org/jira/browse/PIG-2909
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Attachments: PIG-2909-2.patch, PIG-2909-avro_test_files.tar.gz, 
 PIG-2909.patch


 Currently, AvroStorage load fails with AvroRuntimeException when encountering 
 corrupted input files. For example,
 {code}
 ERROR 2997: Unable to recreate exception from backed error: 
 java.io.IOException: org.apache.avro.AvroRuntimeException: 
 java.io.IOException: Invalid sync!
   at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283)
 {code}
 But it is not always desirable to fail the Pig job for bad files. It is 
 sometimes more useful to skip them and continue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception

2012-09-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2637:
--

Status: Patch Available  (was: Open)

 Command-line option -e throws TokenMgrError exception
 -

 Key: PIG-2637
 URL: https://issues.apache.org/jira/browse/PIG-2637
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.2
Reporter: Richard Ding
Assignee: fang fang chen
Priority: Minor
 Attachments: PIG-2637.patch


 The command-line:
 {code}
 java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt';
 {code}
 fails with exception:
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 1, column 18.  
 Encountered: EOF after : 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception

2012-09-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2637:


   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks fang fang!

 Command-line option -e throws TokenMgrError exception
 -

 Key: PIG-2637
 URL: https://issues.apache.org/jira/browse/PIG-2637
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.2
Reporter: Richard Ding
Assignee: fang fang chen
Priority: Minor
 Fix For: 0.11

 Attachments: PIG-2637.patch


 The command-line:
 {code}
 java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt';
 {code}
 fails with exception:
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 1, column 18.  
 Encountered: EOF after : 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2909) Add a new option for ignoring corrupted files to AvroStorage load func

2012-09-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2909:


   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Patch 2 plus new tests checked in.  Thanks Cheolsoo.

 Add a new option for ignoring corrupted files to AvroStorage load func
 --

 Key: PIG-2909
 URL: https://issues.apache.org/jira/browse/PIG-2909
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.11

 Attachments: PIG-2909-2.patch, PIG-2909-avro_test_files.tar.gz, 
 PIG-2909.patch


 Currently, AvroStorage load fails with AvroRuntimeException when encountering 
 corrupted input files. For example,
 {code}
 ERROR 2997: Unable to recreate exception from backed error: 
 java.io.IOException: org.apache.avro.AvroRuntimeException: 
 java.io.IOException: Invalid sync!
   at 
 org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283)
 {code}
 But it is not always desirable to fail the Pig job for bad files. It is 
 sometimes more useful to skip them and continue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2900) Streaming should provide conf settings in the environment

2012-09-13 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2900:
---

   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Committed to trunk.
Thanks for the review, Alan!

 Streaming should provide conf settings in the environment
 -

 Key: PIG-2900
 URL: https://issues.apache.org/jira/browse/PIG-2900
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.11

 Attachments: PIG-2900.1.patch, PIG-2900.patch


 Hadoop Streaming converts jobconf properties into environment variables; Pig 
 streaming does not. This is a useful feature that Pig streaming should 
 provide.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2744) Handle Pig command line with XML special characters

2012-09-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2744:


   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Patch committed to trunk.

 Handle Pig command line with XML special characters
 ---

 Key: PIG-2744
 URL: https://issues.apache.org/jira/browse/PIG-2744
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Richard Ding
Assignee: fang fang chen
 Fix For: 0.11

 Attachments: PIG-2744.patch


 Pig stores Pig command line string to the Hadoop job XML file. It will fail 
 if the command line string contains XML special characters. Pig should treat 
 the command string like Pig script by first encoding it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2900) Streaming should provide conf settings in the environment

2012-09-13 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2900:
---

Release Note: 
The STREAM operator now makes all jobconf properties available to the programs 
processing streaming input via environment variables, consistend with Hadoop 
Streaming behavior.
All . characters in the jobconf properties are replaced with underscores, _.

 Streaming should provide conf settings in the environment
 -

 Key: PIG-2900
 URL: https://issues.apache.org/jira/browse/PIG-2900
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.11

 Attachments: PIG-2900.1.patch, PIG-2900.patch


 Hadoop Streaming converts jobconf properties into environment variables; Pig 
 streaming does not. This is a useful feature that Pig streaming should 
 provide.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

2012-09-13 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
---

(Updated Sept. 13, 2012, 10:46 p.m.)


Review request for pig and Santhosh Srinivasan.


Changes
---

Rebased the patch to trunk.


Description
---

Add support for multiple avro schemas to AvroStorage. This patch is based on 
Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
https://issues.apache.org/jira/browse/PIG-2579


Diffs (updated)
-

  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 d7a004f 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
 84280af 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
 fb5cc25 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
 75057f9 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 1f6e581 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
---

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park



[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

2012-09-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2579:
---

Attachment: PIG-2579-3.patch

I rebased the patch to trunk.

 Support for multiple input schemas in AvroStorage
 -

 Key: PIG-2579
 URL: https://issues.apache.org/jira/browse/PIG-2579
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.9.2, 0.11
Reporter: Stan Rosenberg
Assignee: Cheolsoo Park
Priority: Minor
 Attachments: avro_storage_union_schema.patch, 
 avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, 
 PIG-2579-2.patch, PIG-2579-3.patch


 This is a barebones patch for AvroStorage which enables support of multiple 
 input schemas.  The assumption is that the input consists of avro files 
 having different schemas that can be unioned, e.g., flat records.  
 A simple illustrative example is attached 
 (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by 
 create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira