[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (29 issues) Subscriber: pigdaily Key Summary PIG-3297Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc https://issues.apache.org/jira/browse/PIG-3297 PIG-3291TestExampleGenerator fails on Windows because of lack of file name escaping https://issues.apache.org/jira/browse/PIG-3291 PIG-3288Kill jobs if the number of output files is over a configurable limit https://issues.apache.org/jira/browse/PIG-3288 PIG-3286TestPigContext.testImportList fails in trunk https://issues.apache.org/jira/browse/PIG-3286 PIG-3285Jobs using HBaseStorage fail to ship dependency jars https://issues.apache.org/jira/browse/PIG-3285 PIG-3281Pig version in pig.pom is incorrect in branch-0.11 https://issues.apache.org/jira/browse/PIG-3281 PIG-3258Patch to allow MultiStorage to use more than one index to generate output tree https://issues.apache.org/jira/browse/PIG-3258 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3247Piggybank functions to mimic OVER clause in SQL https://issues.apache.org/jira/browse/PIG-3247 PIG-3223AvroStorage does not handle comma separated input paths https://issues.apache.org/jira/browse/PIG-3223 PIG-3210Pig fails to start when it cannot write log to log files https://issues.apache.org/jira/browse/PIG-3210 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3169Remove intermediate data after a job finishes https://issues.apache.org/jira/browse/PIG-3169 PIG-3166Update eclipse .classpath according to ivy library.properties https://issues.apache.org/jira/browse/PIG-3166 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3105Fix TestJobSubmission unit test failure. https://issues.apache.org/jira/browse/PIG-3105 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-2970Nested foreach getting incorrect schema when having unrelated inner query https://issues.apache.org/jira/browse/PIG-2970 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2248Pig parser does not detect when a macro name masks a UDF name https://issues.apache.org/jira/browse/PIG-2248 PIG-2244Macros cannot be passed relation names https://issues.apache.org/jira/browse/PIG-2244 PIG-1914Support load/store JSON data in Pig https://issues.apache.org/jira/browse/PIG-1914 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Updated] (PIG-2641) Create toJSON function for all complex types: tuples, bags and maps
[ https://issues.apache.org/jira/browse/PIG-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2641: Status: Open (was: Patch Available) Canceling patch pending changes per Daniel's feedback. > Create toJSON function for all complex types: tuples, bags and maps > --- > > Key: PIG-2641 > URL: https://issues.apache.org/jira/browse/PIG-2641 > Project: Pig > Issue Type: New Feature > Components: piggybank >Affects Versions: 0.12 > Environment: Foggy. Damn foggy. >Reporter: Russell Jurney >Assignee: Russell Jurney > Labels: chararray, fun, happy, input, json, output, pants, pig, > piggybank, string, wonderdog > Fix For: 0.12 > > Attachments: PIG-2641-2.patch, PIG-2641-3.patch, PIG-2641-4.patch, > PIG-2641-5.patch, PIG-2641-6.patch, PIG-2641.patch > > Original Estimate: 96h > Remaining Estimate: 96h > > It is a travesty that there are no UDFs in Piggybanks that, given an > arbitrary Pig datatype, return a JSON string of same. I intend to fix this > problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3028) testGrunt dev test needs some command filters to run correctly without cygwin
[ https://issues.apache.org/jira/browse/PIG-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3028: Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked in. Thanks John. > testGrunt dev test needs some command filters to run correctly without cygwin > - > > Key: PIG-3028 > URL: https://issues.apache.org/jira/browse/PIG-3028 > Project: Pig > Issue Type: Sub-task > Components: build >Affects Versions: 0.10.0 >Reporter: John Gordon >Assignee: John Gordon > Fix For: 0.12 > > Attachments: PIG-3028.trunk.1.patch > > > TestGrunt still has some commands that depend on cygwin, Namely rm -rf. This > should be rd /S on Windows. It needs a hook and variable abstraction for os > commands like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
CfP 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)
we apologize if you receive multiple copies of this message === CALL FOR PAPERS 2013 Workshop on Middleware for HPC and Big Data Systems MHPC '13 as part of Euro-Par 2013, Aachen, Germany === Date: August 27, 2012 Workshop URL: http://m-hpc.org Springer LNCS SUBMISSION DEADLINE: May 31, 2013 - LNCS Full paper submission (rolling abstract submission) June 28, 2013 - Lightning Talk abstracts SCOPE Extremely large, diverse, and complex data sets are generated from scientific applications, the Internet, social media and other applications. Data may be physically distributed and shared by an ever larger community. Collecting, aggregating, storing and analyzing large data volumes presents major challenges. Processing such amounts of data efficiently has been an issue to scientific discovery and technological advancement. In addition, making the data accessible, understandable and interoperable includes unsolved problems. Novel middleware architectures, algorithms, and application development frameworks are required. In this workshop we are particularly interested in original work at the intersection of HPC and Big Data with regard to middleware handling and optimizations. Scope is existing and proposed middleware for HPC and big data, including analytics libraries and frameworks. The goal of this workshop is to bring together software architects, middleware and framework developers, data-intensive application developers as well as users from the scientific and engineering community to exchange their experience in processing large datasets and to report their scientific achievement and innovative ideas. The workshop also offers a dedicated forum for these researchers to access the state of the art, to discuss problems and requirements, to identify gaps in current and planned designs, and to collaborate in strategies for scalable data-intensive computing. The workshop will be one day in length, composed of 20 min paper presentations, each followed by 10 min discussion sections. Presentations may be accompanied by interactive demonstrations. TOPICS Topics of interest include, but are not limited to: - Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive, Pig, Sqoop, HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack - Data intensive middleware architecture - Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab - NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase - Schedulers including Cascading - Middleware for optimized data locality/in-place data processing - Data handling middleware for deployment in virtualized HPC environments - Parallelization and distributed processing architectures at the middleware level - Integration with cloud middleware and application servers - Runtime environments and system level support for data-intensive computing - Skeletons and patterns - Checkpointing - Programming models and languages - Big Data ETL - Stream processing middleware - In-memory databases for HPC - Scalability and interoperability - Large-scale data storage and distributed file systems - Content-centric addressing and networking - Execution engines, languages and environments including CIEL/Skywriting - Performance analysis, evaluation of data-intensive middleware - In-depth analysis and performance optimizations in existing data-handling middleware, focusing on indexing/fast storing or retrieval between compute and storage nodes - Highly scalable middleware optimized for minimum communication - Use cases and experience for popular Big Data middleware - Middleware security, privacy and trust architectures DATES Papers: Rolling abstract submission May 31, 2013 - Full paper submission July 8, 2013 - Acceptance notification October 3, 2013 - Camera-ready version due Lightning Talks: June 28, 2013 - Deadline for lightning talk abstracts July 15, 2013 - Lightning talk notification August 27, 2013 - Workshop Date TPC CHAIR Michael Alexander (chair), TU Wien, Austria Anastassios Nanos (co-chair), NTUA, Greece Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany Lizhe Wang (co-chair), Chinese Academy of Sciences, China Gianluigi Zanetti (co-chair), CRS4, Italy PROGRAM COMMITTEE Amitanand Aiyer, Facebook, USA Costas Bekas, IBM, Switzerland Jakob Blomer, CERN, Switzerland William Gardner, University of Guelph, Canada José Gracia, HPC Center of the University of Stuttgart, Germany Zhenghua Guom, Indiana University, USA Marcus Hardt, Karlsruhe Institute of Technology, Germany Sverre Jarp, CERN, Switzerland Christopher Jung, Karlsruhe Institute of Technology, Germany Andreas Knüpfer - Technische Universität Dresden, Germany Nectarios Koziris, National Technical University of Athens, Greece Yan Ma, Chinese Academy of Sciences, China Martin Schulz - Lawrence Livermore National Laboratory Viral Shah, MIT
[jira] [Updated] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc
[ https://issues.apache.org/jira/browse/PIG-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes updated PIG-3297: -- Release Note: Read Avro files that have string fields that were written with avro.java.string = String Status: Patch Available (was: Open) > Avro files with stringType set to String cannot be read by the AvroStorage > LoadFunc > --- > > Key: PIG-3297 > URL: https://issues.apache.org/jira/browse/PIG-3297 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Niels Basjes > Attachments: PIG-3297-1.patch > > > When an Avro file is created there exists the option to set the "String Type" > to a different class than the default Utf8. > A very common situation is that the "String Type" is set to the default > String class. > When trying to read such an Avro file in Pig using the AvroStorage LoadFunc > from the included piggybank this gives the following Exception: > Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.avro.util.Utf8 > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc
[ https://issues.apache.org/jira/browse/PIG-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes updated PIG-3297: -- Attachment: PIG-3297-1.patch The patch I created. > Avro files with stringType set to String cannot be read by the AvroStorage > LoadFunc > --- > > Key: PIG-3297 > URL: https://issues.apache.org/jira/browse/PIG-3297 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Niels Basjes > Attachments: PIG-3297-1.patch > > > When an Avro file is created there exists the option to set the "String Type" > to a different class than the default Utf8. > A very common situation is that the "String Type" is set to the default > String class. > When trying to read such an Avro file in Pig using the AvroStorage LoadFunc > from the included piggybank this gives the following Exception: > Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.avro.util.Utf8 > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3295) Casting from bytearray failing after Union (even when each field is from a single Loader)
[ https://issues.apache.org/jira/browse/PIG-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642170#comment-13642170 ] Koji Noguchi commented on PIG-3295: --- Forgot to mention, I didn't fix PIG-3293 case but updated the error message to indicate it could be from Union with multiple loaders. > Casting from bytearray failing after Union (even when each field is from a > single Loader) > - > > Key: PIG-3295 > URL: https://issues.apache.org/jira/browse/PIG-3295 > Project: Pig > Issue Type: Bug > Components: parser >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-3295-v01.patch > > > One example > {noformat} > A = load 'data1.txt' as line:bytearray; > B = load 'c1.txt' using TextLoader() as cookie1; > C = load 'c2.txt' using TextLoader() as cookie2; > B2 = join A by line, B by cookie1; > C2 = join A by line, C by cookie2; > D = union onschema B2,C2; -- D: {A::line: bytearray,B::cookie1: > bytearray,C::cookie2: bytearray} > E = foreach D generate (chararray) line, (chararray) cookie1, (chararray) > cookie2; > dump E; > {noformat} > This script fails at runtime with > "Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1075: > Received a bytearray from the UDF. Cannot determine how to convert the > bytearray to string." > This is different from PIG-3293 such that each field in 'D' belongs to a > single loader whereas on PIG-3293, it came from multiple loader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3295) Casting from bytearray failing after Union (even when each field is from a single Loader)
[ https://issues.apache.org/jira/browse/PIG-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-3295: -- Attachment: pig-3295-v01.patch Attaching an initial patch. Instead of having one FuncSpec per LOUnion (PIG-2493), checking each field and setting different FuncSpec when possible. > Casting from bytearray failing after Union (even when each field is from a > single Loader) > - > > Key: PIG-3295 > URL: https://issues.apache.org/jira/browse/PIG-3295 > Project: Pig > Issue Type: Bug > Components: parser >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-3295-v01.patch > > > One example > {noformat} > A = load 'data1.txt' as line:bytearray; > B = load 'c1.txt' using TextLoader() as cookie1; > C = load 'c2.txt' using TextLoader() as cookie2; > B2 = join A by line, B by cookie1; > C2 = join A by line, C by cookie2; > D = union onschema B2,C2; -- D: {A::line: bytearray,B::cookie1: > bytearray,C::cookie2: bytearray} > E = foreach D generate (chararray) line, (chararray) cookie1, (chararray) > cookie2; > dump E; > {noformat} > This script fails at runtime with > "Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1075: > Received a bytearray from the UDF. Cannot determine how to convert the > bytearray to string." > This is different from PIG-3293 such that each field in 'D' belongs to a > single loader whereas on PIG-3293, it came from multiple loader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1713) SAMPLE command should accept parameters to specify alternative sampling algorithm
[ https://issues.apache.org/jira/browse/PIG-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642146#comment-13642146 ] Vicki Fu commented on PIG-1713: --- Hi I had added this ticket into my gsco 2013 proposal. I had my first draft here, would you please give me some feedback? http://vickifu.info/?p=29 > SAMPLE command should accept parameters to specify alternative sampling > algorithm > - > > Key: PIG-1713 > URL: https://issues.apache.org/jira/browse/PIG-1713 > Project: Pig > Issue Type: Improvement >Reporter: Viraj Bhat > Labels: gsoc2012 > > I have a script which takes in a command line parameter. > {code} > pig -p number=100 script.pig > {code} > The script contains the following parameters: > {code} > A = load '/user/viraj/test' using PigStorage() as (a,b,c); > B = SAMPLE A 1/$number; > dump B; > {code} > Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data > on demand. > Ideally I would like to calculate SAMPLE from within Pig script without > having to run one Pig script first get it's results and another to pass the > results. > Ideal use case: > {code} > A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3); > ... > ... > W = group X by col1; > Z = foreach Y generate AVG(X); > AA = load '/user/viraj/test' using PigStorage() as (a,b,c); > BB = SAMPLE AA 1/Z; > dump BB; > {code} > Viraj > Change this Jira to only track sampling algorithm. PIG-1926 is opened to > track limit/sample taking scalar. > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3221) Bootstrap sampling
[ https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642142#comment-13642142 ] Vicki Fu commented on PIG-3221: --- Hi Gianmarco, I had finished the first draft of my GSOC 2013 proposal, Would you please give me some feedback? http://vickifu.info/?p=29 Thanks Vicky > Bootstrap sampling > -- > > Key: PIG-3221 > URL: https://issues.apache.org/jira/browse/PIG-3221 > Project: Pig > Issue Type: New Feature >Reporter: Gianmarco De Francisci Morales > Labels: gsoc2013 > > Implement a bootstrap sampling option ( > http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE > operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3177) Fix Pig project SEO so latest, 0.11 docs show when you google things
[ https://issues.apache.org/jira/browse/PIG-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642012#comment-13642012 ] Mark Wagner commented on PIG-3177: -- [~russell.jurney], I think the proper way to do this is to use a sitemap.xml: http://www.sitemaps.org/protocol.html. We can promote the latest docs by giving them a higher 'priority' tag. Reading through articles, it's not clear to me whether it's possible to just specify http://pig.apache.org/docs/r0.11.1 to have a given priority, and all the docs underneath it will inherit that, or if we need to enumerate every page. > Fix Pig project SEO so latest, 0.11 docs show when you google things > > > Key: PIG-3177 > URL: https://issues.apache.org/jira/browse/PIG-3177 > Project: Pig > Issue Type: Bug > Components: site >Affects Versions: 0.11 >Reporter: Russell Jurney >Assignee: Russell Jurney >Priority: Critical > Fix For: 0.12 > > > http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html > The 0.7.0 docs are what everyone references. FOR POOPS SAKES. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3291) TestExampleGenerator fails on Windows because of lack of file name escaping
[ https://issues.apache.org/jira/browse/PIG-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Wannemacher updated PIG-3291: --- Issue Type: Sub-task (was: Bug) Parent: PIG-2793 > TestExampleGenerator fails on Windows because of lack of file name escaping > --- > > Key: PIG-3291 > URL: https://issues.apache.org/jira/browse/PIG-3291 > Project: Pig > Issue Type: Sub-task >Affects Versions: 0.12 > Environment: Windows >Reporter: David Wannemacher > Fix For: 0.12 > > Attachments: PIG-3291.trunk.patch > > > On Windows, all tests fail with an exception like this: > Testcase: testFilterGroupCountStore took 0.022 sec > Caused an ERROR > Error during parsing. Unexpected character 'S' > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during > parsing. Unexpected character 'S' > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1669) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1607) > at org.apache.pig.PigServer.registerQuery(PigServer.java:563) > at org.apache.pig.PigServer.registerQuery(PigServer.java:576) > at > org.apache.pig.test.TestExampleGenerator.testFilterGroupCountStore(TestExampleGenerator.java:394) > Caused by: Failed to parse: Unexpected character 'S' > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:235) > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:174) > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1660) > Looks like a change in https://issues.apache.org/jira/browse/PIG-2170 caused > the file names to stop being escaped properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.
[ https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reopened PIG-3164: - Backed these changes out; I should never have checked them in. I missed that this was only in test and not in main, so I ended up compiling the wrong thing to make sure this worked. UDFs should not be added under piggybank/java/src/test. That's for unit tests for the UDF. The UDFs should be under piggybank/java/src/main. Thanks Niels for catching my mistake. > Pig current releases lack a UDF endsWith.This UDF tests if a given string > ends with the specified suffix. > - > > Key: PIG-3164 > URL: https://issues.apache.org/jira/browse/PIG-3164 > Project: Pig > Issue Type: New Feature > Components: piggybank >Affects Versions: 0.10.0 >Reporter: Anuroopa George >Assignee: Anuroopa George > Fix For: 0.12 > > Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java > > > Pig current releases lack a UDF endsWith.This UDF tests if a given string > ends with the specified suffix.This UDF returns true if the character > sequence represented by the string argument given as a suffix is a suffix of > the character sequence represented by the given string; false otherwise.Also > true will be returned if the given suffix is an empty string or is equal to > the given String. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3010: Status: Open (was: Patch Available) Patch no longer applies. This causes review board to not show the diffs either. Sorry for waiting so long on this. > Allow UDF's to flatten themselves > - > > Key: PIG-3010 > URL: https://issues.apache.org/jira/browse/PIG-3010 > Project: Pig > Issue Type: Improvement >Reporter: Jonathan Coveney >Assignee: Jonathan Coveney > Fix For: 0.12 > > Attachments: PIG-3010-0.patch, PIG-3010-1.patch, > PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch, PIG-3010-3_nows.patch, > PIG-3010-3.patch, PIG-3010-4_nows.patch, PIG-3010-4.patch, > PIG-3010-5_nows.patch, PIG-3010-5.patch > > > This is something I thought would be cool for a while, so I sat down and did > it because I think there are some useful debugging tools it'd help with. > The idea is that if you attach an annotation to a UDF, the Tuple or DataBag > you output will be flattened. This is quite powerful. A very common pattern > is: > a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c); > This would let you just do: > a = foreach data generate MyUdf(thing); > With the exact same result! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.
[ https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641898#comment-13641898 ] Niels Basjes commented on PIG-3164: --- This code contains this line public class ENDSWITH extends EvalFunc throws ExecException { A class cannot throw anything. Please reopen and fix. > Pig current releases lack a UDF endsWith.This UDF tests if a given string > ends with the specified suffix. > - > > Key: PIG-3164 > URL: https://issues.apache.org/jira/browse/PIG-3164 > Project: Pig > Issue Type: New Feature > Components: piggybank >Affects Versions: 0.10.0 >Reporter: Anuroopa George >Assignee: Anuroopa George > Fix For: 0.12 > > Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java > > > Pig current releases lack a UDF endsWith.This UDF tests if a given string > ends with the specified suffix.This UDF returns true if the character > sequence represented by the string argument given as a suffix is a suffix of > the character sequence represented by the given string; false otherwise.Also > true will be returned if the given suffix is an empty string or is equal to > the given String. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc
[ https://issues.apache.org/jira/browse/PIG-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641891#comment-13641891 ] Niels Basjes commented on PIG-3297: --- I have a working fix that I'll submit shortly. > Avro files with stringType set to String cannot be read by the AvroStorage > LoadFunc > --- > > Key: PIG-3297 > URL: https://issues.apache.org/jira/browse/PIG-3297 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Niels Basjes > > When an Avro file is created there exists the option to set the "String Type" > to a different class than the default Utf8. > A very common situation is that the "String Type" is set to the default > String class. > When trying to read such an Avro file in Pig using the AvroStorage LoadFunc > from the included piggybank this gives the following Exception: > Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.avro.util.Utf8 > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
[ https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641590#comment-13641590 ] Jonathan Coveney commented on PIG-3215: --- Do it! > [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files > > > Key: PIG-3215 > URL: https://issues.apache.org/jira/browse/PIG-3215 > Project: Pig > Issue Type: New Feature > Components: piggybank >Reporter: MIYAKAWA Taku >Assignee: MIYAKAWA Taku > Labels: piggybank > Attachments: LTSVLoader-6.html, LTSVLoader.html, PIG-3215-6.patch, > PIG-3215.patch > > > LTSV, or Labeled Tab-separated Values format is now getting popular in Japan > for log files, especially of web servers. The goal of this jira is to add > LTSVLoader in PiggyBank to load LTSV files. > LTSV is based on TSV thus columns are separated by tab characters. > Additionally each of columns includes a label and a value, separated by ":" > character. > Read about LTSV on http://ltsv.org/. > h4. Example LTSV file (access.log) > Columns are separated by tab characters. > {noformat} > host:host1.example.orgreq:GET /index.html ua:Opera/9.80 > host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80 > host:pc.example.com req:GET /news.html ua:Mozilla/5.0 > {noformat} > h4. Usage 1: Extract fields from each line > Users can specify an input schema and get columns as Pig fields. > This example loads the LTSV file shown in the previous section. > {code} > -- Parses the access log and count the number of lines > -- for each pair of the host column and the ua column. > access = LOAD 'access.log' USING > org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray'); > grouped_access = GROUP access BY (host, ua); > count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, > COUNT(access); > DUMP count_for_host_ua; > {code} > The below text will be printed out. > {noformat} > (host1.example.org,Opera/9.80,2) > (pc.example.com,Firefox/5.0,1) > {noformat} > h4. Usage 2: Extract a map from each line > Users can get a map for each LTSV line. The key of a map is a label of the > LTSV column. The value of a map comes from characters after ":" in the LTSV > column. > {code} > -- Parses the access log and projects the user agent field. > access = LOAD 'access.log' USING > org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]); > user_agent = FOREACH access GENERATE m#'ua' AS ua; > DUMP user_agent; > {code} > The below text will be printed out. > {noformat} > (Opera/9.80) > (Opera/9.80) > (Firefox/5.0) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc
Niels Basjes created PIG-3297: - Summary: Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc Key: PIG-3297 URL: https://issues.apache.org/jira/browse/PIG-3297 Project: Pig Issue Type: Bug Components: piggybank Affects Versions: 0.11.1 Reporter: Niels Basjes When an Avro file is created there exists the option to set the "String Type" to a different class than the default Utf8. A very common situation is that the "String Type" is set to the default String class. When trying to read such an Avro file in Pig using the AvroStorage LoadFunc from the included piggybank this gives the following Exception: Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8 at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira