[jira] [Updated] (PIG-2909) Add a new option for ignoring corrupted files to AvroStorage load func
[ https://issues.apache.org/jira/browse/PIG-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2909: --- Attachment: PIG-2909-2.patch Add a new option for ignoring corrupted files to AvroStorage load func -- Key: PIG-2909 URL: https://issues.apache.org/jira/browse/PIG-2909 Project: Pig Issue Type: Improvement Components: piggybank Affects Versions: 0.10.0 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Attachments: PIG-2909-2.patch, PIG-2909-avro_test_files.tar.gz, PIG-2909.patch Currently, AvroStorage load fails with AvroRuntimeException when encountering corrupted input files. For example, {code} ERROR 2997: Unable to recreate exception from backed error: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283) {code} But it is not always desirable to fail the Pig job for bad files. It is sometimes more useful to skip them and continue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2637) Command-line option -e throws TokenMgrError exception
[ https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fang fang chen reassigned PIG-2637: --- Assignee: fang fang chen (was: Richard Ding) Command-line option -e throws TokenMgrError exception - Key: PIG-2637 URL: https://issues.apache.org/jira/browse/PIG-2637 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.9.2 Reporter: Richard Ding Assignee: fang fang chen Priority: Minor The command-line: {code} java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt'; {code} fails with exception: {code} ERROR 1000: Error during parsing. Lexical error at line 1, column 18. Encountered: EOF after : {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception
[ https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fang fang chen updated PIG-2637: Attachment: PIG-2637.patch Command-line option -e throws TokenMgrError exception - Key: PIG-2637 URL: https://issues.apache.org/jira/browse/PIG-2637 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.9.2 Reporter: Richard Ding Assignee: fang fang chen Priority: Minor Attachments: PIG-2637.patch The command-line: {code} java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt'; {code} fails with exception: {code} ERROR 1000: Error during parsing. Lexical error at line 1, column 18. Encountered: EOF after : {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2637) Command-line option -e throws TokenMgrError exception
[ https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454736#comment-13454736 ] fang fang chen commented on PIG-2637: - Attach the patch file. Test passed in my environment. Command-line option -e throws TokenMgrError exception - Key: PIG-2637 URL: https://issues.apache.org/jira/browse/PIG-2637 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.9.2 Reporter: Richard Ding Assignee: fang fang chen Priority: Minor Attachments: PIG-2637.patch The command-line: {code} java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt'; {code} fails with exception: {code} ERROR 1000: Error during parsing. Lexical error at line 1, column 18. Encountered: EOF after : {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK
[ https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fang fang chen updated PIG-2405: Attachment: 2405_2.patch 2405_1.patch Current all UT passed. 2405_1.patch is for TestDataModel, TestNewPlanLogToPhyTranslationVisitor, TestMRCompiler 2405_2.patch is for TestPruneColumn svn tags/release-0.9.1: some unit test case failed with open JDK Key: PIG-2405 URL: https://issues.apache.org/jira/browse/PIG-2405 Project: Pig Issue Type: Bug Affects Versions: 0.9.1 Environment: ant-1.8.2 open jdk: 1.6 Reporter: fang fang chen Assignee: fang fang chen Attachments: 2405_1.patch, 2405_2.patch [junit] Test org.apache.pig.test.TestDataModel FAILED Testcase: testTupleToString took 0.004 sec FAILED toString expected:...ad a little lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a little lamb)},[[goodbye#all,hello#world]],42,50,3.14... junit.framework.ComparisonFailure: toString expected:...ad a little lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a little lamb)},[[goodbye#all,hello#world]],42,50,3.14... at org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269 [junit] Test org.apache.pig.test.TestHBaseStorage FAILED Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec Testcase: testHeterogeneousScans took 0.018 sec Caused an ERROR java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) java.lang.RuntimeException: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980) at org.apache.hadoop.conf.Configuration.get(Configuration.java:436) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:271) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:130) at org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809) at org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741) Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) at java.io.FileInputStream.init(FileInputStream.java:112) at java.io.FileInputStream.init(FileInputStream.java:72) at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079) Caused an ERROR Could not resolve the DNS name of hostname:39611 java.lang.IllegalArgumentException: Could not resolve the DNS name of hostname:39611 at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:66) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145) at org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120) at
[jira] [Commented] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK
[ https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454742#comment-13454742 ] fang fang chen commented on PIG-2405: - Patch files are based on pig-trunk. svn tags/release-0.9.1: some unit test case failed with open JDK Key: PIG-2405 URL: https://issues.apache.org/jira/browse/PIG-2405 Project: Pig Issue Type: Bug Affects Versions: 0.9.1 Environment: ant-1.8.2 open jdk: 1.6 Reporter: fang fang chen Assignee: fang fang chen Attachments: 2405_1.patch, 2405_2.patch [junit] Test org.apache.pig.test.TestDataModel FAILED Testcase: testTupleToString took 0.004 sec FAILED toString expected:...ad a little lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a little lamb)},[[goodbye#all,hello#world]],42,50,3.14... junit.framework.ComparisonFailure: toString expected:...ad a little lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a little lamb)},[[goodbye#all,hello#world]],42,50,3.14... at org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269 [junit] Test org.apache.pig.test.TestHBaseStorage FAILED Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec Testcase: testHeterogeneousScans took 0.018 sec Caused an ERROR java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) java.lang.RuntimeException: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980) at org.apache.hadoop.conf.Configuration.get(Configuration.java:436) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:271) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:130) at org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809) at org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741) Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) at java.io.FileInputStream.init(FileInputStream.java:112) at java.io.FileInputStream.init(FileInputStream.java:72) at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079) Caused an ERROR Could not resolve the DNS name of hostname:39611 java.lang.IllegalArgumentException: Could not resolve the DNS name of hostname:39611 at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:66) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145) at org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120) at org.apache.pig.test.TestHBaseStorage.tearDown(TestHBaseStorage.java:112) [junit] Test org.apache.pig.test.TestMRCompiler FAILED Testcase:
[jira] [Commented] (PIG-2637) Command-line option -e throws TokenMgrError exception
[ https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454743#comment-13454743 ] fang fang chen commented on PIG-2637: - Patch files are based on pig-trunk. Command-line option -e throws TokenMgrError exception - Key: PIG-2637 URL: https://issues.apache.org/jira/browse/PIG-2637 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.9.2 Reporter: Richard Ding Assignee: fang fang chen Priority: Minor Attachments: PIG-2637.patch The command-line: {code} java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt'; {code} fails with exception: {code} ERROR 1000: Error during parsing. Lexical error at line 1, column 18. Encountered: EOF after : {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2744) Handle Pig command line with XML special characters
[ https://issues.apache.org/jira/browse/PIG-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454745#comment-13454745 ] fang fang chen commented on PIG-2744: - Patch files are based on pig-trunk. Handle Pig command line with XML special characters --- Key: PIG-2744 URL: https://issues.apache.org/jira/browse/PIG-2744 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: fang fang chen Attachments: PIG-2744.patch Pig stores Pig command line string to the Hadoop job XML file. It will fail if the command line string contains XML special characters. Pig should treat the command string like Pig script by first encoding it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Linking bin/pig to some PATH dir
Hello, in bin/pig there is a line that says resolve links - $0 may be a softlink, yet it does not seem to resolve the link for the script itself. I'm not sure what line 71 intends to do, however the attached patch should allow users to link bin/pig to their preferred PATH directory and still make pig run as usual. Best regards Jonas
[jira] [Commented] (PIG-1748) Add load/store function AvroStorage for avro data
[ https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454762#comment-13454762 ] deb ashish commented on PIG-1748: - REGISTER /path/avro-1.4.1.jar REGISTER /path/json-simple-1.1.jar REGISTER /path/piggybank.jar REGISTER /path/jackson-core-asl-1.5.5.jar REGISTER /path/jackson-mapper-asl-1.5.5.jar avro = LOAD '/hdfs path/part-r-0.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); Im trying this code but it's unable to read the avro file,showing the following exception Pig Stack Trace --- ERROR 2997: Unable to recreate exception from backed error: Error: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sc. Backend error : Unable to recreate exception from backed error: Error: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected at org.apache.pig.PigServer.openIterator(PigServer.java:742) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:406) at org.apache.pig.Main.main(Main.java:107) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: Error: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198) at org.apache.pig.PigServer.storeEx(PigServer.java:874) at org.apache.pig.PigServer.store(PigServer.java:816) at org.apache.pig.PigServer.openIterator(PigServer.java:728) ... 7 more please help me asap Add load/store function AvroStorage for avro data - Key: PIG-1748 URL: https://issues.apache.org/jira/browse/PIG-1748 Project: Pig Issue Type: Improvement Components: impl Reporter: lin guo Assignee: lin guo Fix For: 0.9.0 Attachments: avro_storage.patch, AvroStorageUtils-bagfix.patch, avro_test_files.tar.gz, PIG-1748-2.patch, PIG-1748-3.patch We want to use Pig to process arbitrary Avro data and store results as Avro files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. Due to discrepancies of Avro and Pig data models, AvroStorage has: 1. Limited support for record: we do not support recursively defined record because the number of fields in such records is data dependent. 2. Limited support for union: we only accept nullable union like [null, some-type]. For simplicity, we also make the following assumptions: If the input directory is a leaf directory, then we assume Avro data files in it have the same schema; If the input directory contains sub-directories, then we assume Avro data files in all sub-directories have the same schema. AvroStorage takes no input parameters when used as a LoadFunc (except for debug [debug-level]). Users can provide parameters to AvroStorage when used as a StoreFunc. If they don't, Avro schema of output data is derived from its Pig schema. Detailed documentation can be found in http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2919) Linking bin/pig to some PATH dir
[ https://issues.apache.org/jira/browse/PIG-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonas Grote updated PIG-2919: - Status: Patch Available (was: Open) Linking bin/pig to some PATH dir Key: PIG-2919 URL: https://issues.apache.org/jira/browse/PIG-2919 Project: Pig Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Jonas Grote Priority: Minor Attachments: pig-link.patch In bin/pig there is a line that says resolve links - $0 may be a softlink, yet it does not seem to resolve the link for the script itself. I'm not sure what line 71 intends to do, however the attached patch should allow users to link bin/pig to their preferred PATH directory and still make pig run as usual. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1748) Add load/store function AvroStorage for avro data
[ https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454985#comment-13454985 ] Jakob Homan commented on PIG-1748: -- @deb - questions like these should be directed to the pig user list, not JIRA. You'll receive assistance there. Add load/store function AvroStorage for avro data - Key: PIG-1748 URL: https://issues.apache.org/jira/browse/PIG-1748 Project: Pig Issue Type: Improvement Components: impl Reporter: lin guo Assignee: lin guo Fix For: 0.9.0 Attachments: avro_storage.patch, AvroStorageUtils-bagfix.patch, avro_test_files.tar.gz, PIG-1748-2.patch, PIG-1748-3.patch We want to use Pig to process arbitrary Avro data and store results as Avro files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. Due to discrepancies of Avro and Pig data models, AvroStorage has: 1. Limited support for record: we do not support recursively defined record because the number of fields in such records is data dependent. 2. Limited support for union: we only accept nullable union like [null, some-type]. For simplicity, we also make the following assumptions: If the input directory is a leaf directory, then we assume Avro data files in it have the same schema; If the input directory contains sub-directories, then we assume Avro data files in all sub-directories have the same schema. AvroStorage takes no input parameters when used as a LoadFunc (except for debug [debug-level]). Users can provide parameters to AvroStorage when used as a StoreFunc. If they don't, Avro schema of output data is derived from its Pig schema. Detailed documentation can be found in http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2909) Add a new option for ignoring corrupted files to AvroStorage load func
[ https://issues.apache.org/jira/browse/PIG-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455082#comment-13455082 ] Cheolsoo Park commented on PIG-2909: I updated my patch. I also added cleanupOnSuccess() to AvroStorage for PIG-1891. Add a new option for ignoring corrupted files to AvroStorage load func -- Key: PIG-2909 URL: https://issues.apache.org/jira/browse/PIG-2909 Project: Pig Issue Type: Improvement Components: piggybank Affects Versions: 0.10.0 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Attachments: PIG-2909-2.patch, PIG-2909-avro_test_files.tar.gz, PIG-2909.patch Currently, AvroStorage load fails with AvroRuntimeException when encountering corrupted input files. For example, {code} ERROR 2997: Unable to recreate exception from backed error: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283) {code} But it is not always desirable to fail the Pig job for bad files. It is sometimes more useful to skip them and continue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception
[ https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2637: -- Status: Patch Available (was: Open) Command-line option -e throws TokenMgrError exception - Key: PIG-2637 URL: https://issues.apache.org/jira/browse/PIG-2637 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.9.2 Reporter: Richard Ding Assignee: fang fang chen Priority: Minor Attachments: PIG-2637.patch The command-line: {code} java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt'; {code} fails with exception: {code} ERROR 1000: Error during parsing. Lexical error at line 1, column 18. Encountered: EOF after : {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception
[ https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2637: Resolution: Fixed Fix Version/s: 0.11 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks fang fang! Command-line option -e throws TokenMgrError exception - Key: PIG-2637 URL: https://issues.apache.org/jira/browse/PIG-2637 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.9.2 Reporter: Richard Ding Assignee: fang fang chen Priority: Minor Fix For: 0.11 Attachments: PIG-2637.patch The command-line: {code} java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt'; {code} fails with exception: {code} ERROR 1000: Error during parsing. Lexical error at line 1, column 18. Encountered: EOF after : {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2909) Add a new option for ignoring corrupted files to AvroStorage load func
[ https://issues.apache.org/jira/browse/PIG-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2909: Resolution: Fixed Fix Version/s: 0.11 Status: Resolved (was: Patch Available) Patch 2 plus new tests checked in. Thanks Cheolsoo. Add a new option for ignoring corrupted files to AvroStorage load func -- Key: PIG-2909 URL: https://issues.apache.org/jira/browse/PIG-2909 Project: Pig Issue Type: Improvement Components: piggybank Affects Versions: 0.10.0 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.11 Attachments: PIG-2909-2.patch, PIG-2909-avro_test_files.tar.gz, PIG-2909.patch Currently, AvroStorage load fails with AvroRuntimeException when encountering corrupted input files. For example, {code} ERROR 2997: Unable to recreate exception from backed error: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:283) {code} But it is not always desirable to fail the Pig job for bad files. It is sometimes more useful to skip them and continue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2900) Streaming should provide conf settings in the environment
[ https://issues.apache.org/jira/browse/PIG-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-2900: --- Resolution: Fixed Fix Version/s: 0.11 Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the review, Alan! Streaming should provide conf settings in the environment - Key: PIG-2900 URL: https://issues.apache.org/jira/browse/PIG-2900 Project: Pig Issue Type: New Feature Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.11 Attachments: PIG-2900.1.patch, PIG-2900.patch Hadoop Streaming converts jobconf properties into environment variables; Pig streaming does not. This is a useful feature that Pig streaming should provide. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2744) Handle Pig command line with XML special characters
[ https://issues.apache.org/jira/browse/PIG-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2744: Resolution: Fixed Fix Version/s: 0.11 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. Handle Pig command line with XML special characters --- Key: PIG-2744 URL: https://issues.apache.org/jira/browse/PIG-2744 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: fang fang chen Fix For: 0.11 Attachments: PIG-2744.patch Pig stores Pig command line string to the Hadoop job XML file. It will fail if the command line string contains XML special characters. Pig should treat the command string like Pig script by first encoding it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2900) Streaming should provide conf settings in the environment
[ https://issues.apache.org/jira/browse/PIG-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-2900: --- Release Note: The STREAM operator now makes all jobconf properties available to the programs processing streaming input via environment variables, consistend with Hadoop Streaming behavior. All . characters in the jobconf properties are replaced with underscores, _. Streaming should provide conf settings in the environment - Key: PIG-2900 URL: https://issues.apache.org/jira/browse/PIG-2900 Project: Pig Issue Type: New Feature Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.11 Attachments: PIG-2900.1.patch, PIG-2900.patch Hadoop Streaming converts jobconf properties into environment variables; Pig streaming does not. This is a useful feature that Pig streaming should provide. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6884/ --- (Updated Sept. 13, 2012, 10:46 p.m.) Review request for pig and Santhosh Srinivasan. Changes --- Rebased the patch to trunk. Description --- Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work. Please see https://issues.apache.org/jira/browse/PIG-2579 for details This addresses bug PIG-2579. https://issues.apache.org/jira/browse/PIG-2579 Diffs (updated) - contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a Diff: https://reviews.apache.org/r/6884/diff/ Testing --- New unit tests are added: - TestAvroStorageUtils.testMergeSchema - TestAvroStorage.testMultipleSchemas1,2 Thanks, Cheolsoo Park
[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2579: --- Attachment: PIG-2579-3.patch I rebased the patch to trunk. Support for multiple input schemas in AvroStorage - Key: PIG-2579 URL: https://issues.apache.org/jira/browse/PIG-2579 Project: Pig Issue Type: New Feature Components: piggybank Affects Versions: 0.9.2, 0.11 Reporter: Stan Rosenberg Assignee: Cheolsoo Park Priority: Minor Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch This is a barebones patch for AvroStorage which enables support of multiple input schemas. The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records. A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira