[jira] Subscription: PIG patch available

2013-04-25 Thread jira
Issue Subscription
Filter: PIG patch available (29 issues)

Subscriber: pigdaily

Key Summary
PIG-3297Avro files with stringType set to String cannot be read by the 
AvroStorage LoadFunc
https://issues.apache.org/jira/browse/PIG-3297
PIG-3291TestExampleGenerator fails on Windows because of lack of file name 
escaping
https://issues.apache.org/jira/browse/PIG-3291
PIG-3288Kill jobs if the number of output files is over a configurable limit
https://issues.apache.org/jira/browse/PIG-3288
PIG-3286TestPigContext.testImportList fails in trunk
https://issues.apache.org/jira/browse/PIG-3286
PIG-3285Jobs using HBaseStorage fail to ship dependency jars
https://issues.apache.org/jira/browse/PIG-3285
PIG-3281Pig version in pig.pom is incorrect in branch-0.11
https://issues.apache.org/jira/browse/PIG-3281
PIG-3258Patch to allow MultiStorage to use more than one index to generate 
output tree
https://issues.apache.org/jira/browse/PIG-3258
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3223AvroStorage does not handle comma separated input paths
https://issues.apache.org/jira/browse/PIG-3223
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3169Remove intermediate data after a job finishes
https://issues.apache.org/jira/browse/PIG-3169
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2970Nested foreach getting incorrect schema when having unrelated inner 
query
https://issues.apache.org/jira/browse/PIG-2970
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-2244Macros cannot be passed relation names
https://issues.apache.org/jira/browse/PIG-2244
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Updated] (PIG-2641) Create toJSON function for all complex types: tuples, bags and maps

2013-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2641:


Status: Open  (was: Patch Available)

Canceling patch pending changes per Daniel's feedback.

> Create toJSON function for all complex types: tuples, bags and maps
> ---
>
> Key: PIG-2641
> URL: https://issues.apache.org/jira/browse/PIG-2641
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.12
> Environment: Foggy. Damn foggy.
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>  Labels: chararray, fun, happy, input, json, output, pants, pig, 
> piggybank, string, wonderdog
> Fix For: 0.12
>
> Attachments: PIG-2641-2.patch, PIG-2641-3.patch, PIG-2641-4.patch, 
> PIG-2641-5.patch, PIG-2641-6.patch, PIG-2641.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> It is a travesty that there are no UDFs in Piggybanks that, given an 
> arbitrary Pig datatype, return a JSON string of same. I intend to fix this 
> problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3028) testGrunt dev test needs some command filters to run correctly without cygwin

2013-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3028:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks John.

> testGrunt dev test needs some command filters to run correctly without cygwin
> -
>
> Key: PIG-3028
> URL: https://issues.apache.org/jira/browse/PIG-3028
> Project: Pig
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 0.10.0
>Reporter: John Gordon
>Assignee: John Gordon
> Fix For: 0.12
>
> Attachments: PIG-3028.trunk.1.patch
>
>
> TestGrunt still has some commands that depend on cygwin, Namely rm -rf.  This 
> should be rd /S on Windows.  It needs a hook and variable abstraction for os 
> commands like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


CfP 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

2013-04-25 Thread MHPC 2013
we apologize if you receive multiple copies of this message
===

CALL FOR PAPERS

2013 Workshop on

Middleware for HPC and Big Data Systems

MHPC '13

as part of Euro-Par 2013, Aachen, Germany

===

Date: August 27, 2012

Workshop URL: http://m-hpc.org

Springer LNCS

SUBMISSION DEADLINE:

May 31, 2013 - LNCS Full paper submission (rolling abstract submission)
June 28, 2013 - Lightning Talk abstracts


SCOPE

Extremely large, diverse, and complex data sets are generated from
scientific applications, the Internet, social media and other applications.
Data may be physically distributed and shared by an ever larger community.
Collecting, aggregating, storing and analyzing large data volumes
presents major challenges. Processing such amounts of data efficiently
has been an issue to scientific discovery and technological
advancement. In addition, making the data accessible, understandable and
interoperable includes unsolved problems. Novel middleware architectures,
algorithms, and application development frameworks are required.

In this workshop we are particularly interested in original work at the
intersection of HPC and Big Data with regard to middleware handling
and optimizations. Scope is existing and proposed middleware for HPC
and big data, including analytics libraries and frameworks.

The goal of this workshop is to bring together software architects,
middleware and framework developers, data-intensive application developers
as well as users from the scientific and engineering community to exchange
their experience in processing large datasets and to report their scientific
achievement and innovative ideas. The workshop also offers a dedicated forum
for these researchers to access the state of the art, to discuss problems
and requirements, to identify gaps in current and planned designs, and to
collaborate in strategies for scalable data-intensive computing.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied by interactive demonstrations.


TOPICS

Topics of interest include, but are not limited to:

- Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive,
Pig, Sqoop,
HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack
- Data intensive middleware architecture
 - Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab
- NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase
- Schedulers including Cascading
- Middleware for optimized data locality/in-place data processing
- Data handling middleware for deployment in virtualized HPC environments
- Parallelization and distributed processing architectures at the
middleware level
- Integration with cloud middleware and application servers
- Runtime environments and system level support for data-intensive computing
- Skeletons and patterns
- Checkpointing
- Programming models and languages
- Big Data ETL
- Stream processing middleware
- In-memory databases for HPC
- Scalability and interoperability
- Large-scale data storage and distributed file systems
- Content-centric addressing and networking
- Execution engines, languages and environments including CIEL/Skywriting
- Performance analysis, evaluation of data-intensive middleware
- In-depth analysis and performance optimizations in existing data-handling
middleware, focusing on indexing/fast storing or retrieval between compute
and storage nodes
- Highly scalable middleware optimized for minimum communication
- Use cases and experience for popular Big Data middleware
- Middleware security, privacy and trust architectures

DATES

Papers:
Rolling abstract submission
May 31, 2013 - Full paper submission
July 8, 2013 - Acceptance notification
October 3, 2013 - Camera-ready version due

Lightning Talks:
June 28, 2013 - Deadline for lightning talk abstracts
July 15, 2013 - Lightning talk notification

August 27, 2013 - Workshop Date


TPC

CHAIR

Michael Alexander (chair), TU Wien, Austria
Anastassios Nanos (co-chair), NTUA, Greece
Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany
Lizhe Wang (co-chair), Chinese Academy of Sciences, China
Gianluigi Zanetti (co-chair), CRS4, Italy

PROGRAM COMMITTEE

Amitanand Aiyer, Facebook, USA
Costas Bekas, IBM, Switzerland
Jakob Blomer, CERN, Switzerland
William Gardner, University of Guelph, Canada
José Gracia, HPC Center of the University of Stuttgart, Germany
Zhenghua Guom,  Indiana University, USA
Marcus Hardt,  Karlsruhe Institute of Technology, Germany
Sverre Jarp, CERN, Switzerland
Christopher Jung,  Karlsruhe Institute of Technology, Germany
Andreas Knüpfer - Technische Universität Dresden, Germany
Nectarios Koziris, National Technical University of Athens, Greece
Yan Ma, Chinese Academy of Sciences, China
Martin Schulz - Lawrence Livermore National Laboratory
Viral Shah, MIT

[jira] [Updated] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc

2013-04-25 Thread Niels Basjes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Basjes updated PIG-3297:
--

Release Note: 
Read Avro files that have string fields that were written with avro.java.string 
= String

  Status: Patch Available  (was: Open)

> Avro files with stringType set to String cannot be read by the AvroStorage 
> LoadFunc
> ---
>
> Key: PIG-3297
> URL: https://issues.apache.org/jira/browse/PIG-3297
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Niels Basjes
> Attachments: PIG-3297-1.patch
>
>
> When an Avro file is created there exists the option to set the "String Type" 
> to a different class than the default Utf8.
> A very common situation is that the "String Type" is set to the default 
> String class.
> When trying to read such an Avro file in Pig using the AvroStorage LoadFunc 
> from the included piggybank this gives the following Exception:
> Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.avro.util.Utf8
> at 
> org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc

2013-04-25 Thread Niels Basjes (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Basjes updated PIG-3297:
--

Attachment: PIG-3297-1.patch

The patch I created.

> Avro files with stringType set to String cannot be read by the AvroStorage 
> LoadFunc
> ---
>
> Key: PIG-3297
> URL: https://issues.apache.org/jira/browse/PIG-3297
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Niels Basjes
> Attachments: PIG-3297-1.patch
>
>
> When an Avro file is created there exists the option to set the "String Type" 
> to a different class than the default Utf8.
> A very common situation is that the "String Type" is set to the default 
> String class.
> When trying to read such an Avro file in Pig using the AvroStorage LoadFunc 
> from the included piggybank this gives the following Exception:
> Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.avro.util.Utf8
> at 
> org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3295) Casting from bytearray failing after Union (even when each field is from a single Loader)

2013-04-25 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642170#comment-13642170
 ] 

Koji Noguchi commented on PIG-3295:
---

Forgot to mention, I didn't fix PIG-3293 case but updated the error message to 
indicate it could be from Union with multiple loaders.  

> Casting from bytearray failing after Union (even when each field is from a 
> single Loader)
> -
>
> Key: PIG-3295
> URL: https://issues.apache.org/jira/browse/PIG-3295
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-3295-v01.patch
>
>
> One example
> {noformat}
> A = load 'data1.txt' as line:bytearray;
> B = load 'c1.txt' using TextLoader() as cookie1;
> C = load 'c2.txt' using TextLoader() as cookie2;
> B2 = join A by line, B by cookie1;
> C2 = join A by line, C by cookie2;
> D = union onschema B2,C2; -- D: {A::line: bytearray,B::cookie1: 
> bytearray,C::cookie2: bytearray}
> E = foreach D generate (chararray) line, (chararray) cookie1, (chararray) 
> cookie2;
> dump E;
> {noformat}
> This script fails at runtime with 
> "Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1075: 
> Received a bytearray from the UDF. Cannot determine how to convert the 
> bytearray to string."
> This is different from PIG-3293 such that each field in 'D' belongs to a 
> single loader whereas on PIG-3293, it came from multiple loader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3295) Casting from bytearray failing after Union (even when each field is from a single Loader)

2013-04-25 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3295:
--

Attachment: pig-3295-v01.patch

Attaching an initial patch.
Instead of having one FuncSpec per LOUnion (PIG-2493), checking each field and 
setting different FuncSpec when possible.

> Casting from bytearray failing after Union (even when each field is from a 
> single Loader)
> -
>
> Key: PIG-3295
> URL: https://issues.apache.org/jira/browse/PIG-3295
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-3295-v01.patch
>
>
> One example
> {noformat}
> A = load 'data1.txt' as line:bytearray;
> B = load 'c1.txt' using TextLoader() as cookie1;
> C = load 'c2.txt' using TextLoader() as cookie2;
> B2 = join A by line, B by cookie1;
> C2 = join A by line, C by cookie2;
> D = union onschema B2,C2; -- D: {A::line: bytearray,B::cookie1: 
> bytearray,C::cookie2: bytearray}
> E = foreach D generate (chararray) line, (chararray) cookie1, (chararray) 
> cookie2;
> dump E;
> {noformat}
> This script fails at runtime with 
> "Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1075: 
> Received a bytearray from the UDF. Cannot determine how to convert the 
> bytearray to string."
> This is different from PIG-3293 such that each field in 'D' belongs to a 
> single loader whereas on PIG-3293, it came from multiple loader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1713) SAMPLE command should accept parameters to specify alternative sampling algorithm

2013-04-25 Thread Vicki Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642146#comment-13642146
 ] 

Vicki Fu commented on PIG-1713:
---

Hi I had added this ticket into my gsco 2013 proposal.
I had my first draft here, would you please give me some feedback?
http://vickifu.info/?p=29

> SAMPLE command should accept parameters to specify alternative sampling 
> algorithm
> -
>
> Key: PIG-1713
> URL: https://issues.apache.org/jira/browse/PIG-1713
> Project: Pig
>  Issue Type: Improvement
>Reporter: Viraj Bhat
>  Labels: gsoc2012
>
> I have a script which takes in a command line parameter.
> {code}
> pig -p number=100 script.pig
> {code}
> The script contains the following parameters:
> {code}
> A = load '/user/viraj/test' using PigStorage() as (a,b,c);
> B = SAMPLE A 1/$number;
> dump B;
> {code}
> Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data 
> on demand.
> Ideally I would like to calculate SAMPLE from within Pig script without 
> having to run one Pig script first get it's results and another to pass the 
> results.
> Ideal use case:
> {code}
> A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3);
> ...
> ...
> W = group X by col1;
> Z = foreach Y generate AVG(X);
> AA = load '/user/viraj/test' using PigStorage() as (a,b,c);
> BB = SAMPLE AA 1/Z;
> dump BB;
> {code}
> Viraj
> Change this Jira to only track sampling algorithm. PIG-1926 is opened to 
> track limit/sample taking scalar.
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3221) Bootstrap sampling

2013-04-25 Thread Vicki Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642142#comment-13642142
 ] 

Vicki Fu commented on PIG-3221:
---

Hi Gianmarco,
I had finished the first draft of my GSOC 2013 proposal, Would you please give 
me some feedback?
http://vickifu.info/?p=29
Thanks
Vicky

> Bootstrap sampling
> --
>
> Key: PIG-3221
> URL: https://issues.apache.org/jira/browse/PIG-3221
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gianmarco De Francisci Morales
>  Labels: gsoc2013
>
> Implement a bootstrap sampling option ( 
> http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE 
> operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3177) Fix Pig project SEO so latest, 0.11 docs show when you google things

2013-04-25 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642012#comment-13642012
 ] 

Mark Wagner commented on PIG-3177:
--

[~russell.jurney], I think the proper way to do this is to use a sitemap.xml: 
http://www.sitemaps.org/protocol.html. We can promote the latest docs by giving 
them a higher 'priority' tag. Reading through articles, it's not clear to me 
whether it's possible to just specify http://pig.apache.org/docs/r0.11.1 to 
have a given priority, and all the docs underneath it will inherit that, or if 
we need to enumerate every page.

> Fix Pig project SEO so latest, 0.11 docs show when you google things
> 
>
> Key: PIG-3177
> URL: https://issues.apache.org/jira/browse/PIG-3177
> Project: Pig
>  Issue Type: Bug
>  Components: site
>Affects Versions: 0.11
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>Priority: Critical
> Fix For: 0.12
>
>
> http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html
> The 0.7.0 docs are what everyone references. FOR POOPS SAKES.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3291) TestExampleGenerator fails on Windows because of lack of file name escaping

2013-04-25 Thread David Wannemacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Wannemacher updated PIG-3291:
---

Issue Type: Sub-task  (was: Bug)
Parent: PIG-2793

> TestExampleGenerator fails on Windows because of lack of file name escaping
> ---
>
> Key: PIG-3291
> URL: https://issues.apache.org/jira/browse/PIG-3291
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.12
> Environment: Windows
>Reporter: David Wannemacher
> Fix For: 0.12
>
> Attachments: PIG-3291.trunk.patch
>
>
> On Windows, all tests fail with an exception like this:
> Testcase: testFilterGroupCountStore took 0.022 sec
>   Caused an ERROR
> Error during parsing.   Unexpected character 'S'
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing.   Unexpected character 'S'
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1669)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1607)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:563)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:576)
>   at 
> org.apache.pig.test.TestExampleGenerator.testFilterGroupCountStore(TestExampleGenerator.java:394)
> Caused by: Failed to parse:   Unexpected character 'S'
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:235)
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:174)
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1660)
> Looks like a change in https://issues.apache.org/jira/browse/PIG-2170 caused 
> the file names to stop being escaped properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reopened PIG-3164:
-


Backed these changes out; I should never have checked them in.  I missed that 
this was only in test and not in main, so I ended up compiling the wrong thing 
to make sure this worked.

UDFs should not be added under piggybank/java/src/test.  That's for unit tests 
for the UDF.  The UDFs should be under piggybank/java/src/main.  

Thanks Niels for catching my mistake.

> Pig current releases lack a UDF endsWith.This UDF tests if a given string 
> ends with the specified suffix.
> -
>
> Key: PIG-3164
> URL: https://issues.apache.org/jira/browse/PIG-3164
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Anuroopa George
>Assignee: Anuroopa George
> Fix For: 0.12
>
> Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java
>
>
> Pig current releases lack a UDF endsWith.This UDF tests if a given string  
> ends with the specified suffix.This UDF returns true if the character 
> sequence represented by the string argument given as a suffix is a suffix of 
> the character sequence represented by the given string; false otherwise.Also 
> true will be returned if the given suffix is an empty string or is equal to 
> the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves

2013-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3010:


Status: Open  (was: Patch Available)

Patch no longer applies.  This causes review board to not show the diffs 
either.  Sorry for waiting so long on this.

> Allow UDF's to flatten themselves
> -
>
> Key: PIG-3010
> URL: https://issues.apache.org/jira/browse/PIG-3010
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3010-0.patch, PIG-3010-1.patch, 
> PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch, PIG-3010-3_nows.patch, 
> PIG-3010-3.patch, PIG-3010-4_nows.patch, PIG-3010-4.patch, 
> PIG-3010-5_nows.patch, PIG-3010-5.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did 
> it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag 
> you output will be flattened. This is quite powerful. A very common pattern 
> is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-04-25 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641898#comment-13641898
 ] 

Niels Basjes commented on PIG-3164:
---

This code contains this line
   public class ENDSWITH extends EvalFunc throws ExecException {   

A class cannot throw anything.
Please reopen and fix.

> Pig current releases lack a UDF endsWith.This UDF tests if a given string 
> ends with the specified suffix.
> -
>
> Key: PIG-3164
> URL: https://issues.apache.org/jira/browse/PIG-3164
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Anuroopa George
>Assignee: Anuroopa George
> Fix For: 0.12
>
> Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java
>
>
> Pig current releases lack a UDF endsWith.This UDF tests if a given string  
> ends with the specified suffix.This UDF returns true if the character 
> sequence represented by the string argument given as a suffix is a suffix of 
> the character sequence represented by the given string; false otherwise.Also 
> true will be returned if the given suffix is an empty string or is equal to 
> the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc

2013-04-25 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641891#comment-13641891
 ] 

Niels Basjes commented on PIG-3297:
---

I have a working fix that I'll submit shortly.

> Avro files with stringType set to String cannot be read by the AvroStorage 
> LoadFunc
> ---
>
> Key: PIG-3297
> URL: https://issues.apache.org/jira/browse/PIG-3297
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Niels Basjes
>
> When an Avro file is created there exists the option to set the "String Type" 
> to a different class than the default Utf8.
> A very common situation is that the "String Type" is set to the default 
> String class.
> When trying to read such an Avro file in Pig using the AvroStorage LoadFunc 
> from the included piggybank this gives the following Exception:
> Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.avro.util.Utf8
> at 
> org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files

2013-04-25 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641590#comment-13641590
 ] 

Jonathan Coveney commented on PIG-3215:
---

Do it!

> [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
> 
>
> Key: PIG-3215
> URL: https://issues.apache.org/jira/browse/PIG-3215
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Reporter: MIYAKAWA Taku
>Assignee: MIYAKAWA Taku
>  Labels: piggybank
> Attachments: LTSVLoader-6.html, LTSVLoader.html, PIG-3215-6.patch, 
> PIG-3215.patch
>
>
> LTSV, or Labeled Tab-separated Values format is now getting popular in Japan 
> for log files, especially of web servers. The goal of this jira is to add 
> LTSVLoader in PiggyBank to load LTSV files.
> LTSV is based on TSV thus columns are separated by tab characters. 
> Additionally each of columns includes a label and a value, separated by ":" 
> character.
> Read about LTSV on http://ltsv.org/.
> h4. Example LTSV file (access.log)
> Columns are separated by tab characters.
> {noformat}
> host:host1.example.orgreq:GET /index.html ua:Opera/9.80
> host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80
> host:pc.example.com   req:GET /news.html  ua:Mozilla/5.0
> {noformat}
> h4. Usage 1: Extract fields from each line
> Users can specify an input schema and get columns as Pig fields.
> This example loads the LTSV file shown in the previous section.
> {code}
> -- Parses the access log and count the number of lines
> -- for each pair of the host column and the ua column.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray');
> grouped_access = GROUP access BY (host, ua);
> count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, 
> COUNT(access);
> DUMP count_for_host_ua;
> {code}
> The below text will be printed out.
> {noformat}
> (host1.example.org,Opera/9.80,2)
> (pc.example.com,Firefox/5.0,1)
> {noformat}
> h4. Usage 2: Extract a map from each line
> Users can get a map for each LTSV line. The key of a map is a label of the 
> LTSV column. The value of a map comes from characters after ":" in the LTSV 
> column.
> {code}
> -- Parses the access log and projects the user agent field.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]);
> user_agent = FOREACH access GENERATE m#'ua' AS ua;
> DUMP user_agent;
> {code}
> The below text will be printed out.
> {noformat}
> (Opera/9.80)
> (Opera/9.80)
> (Firefox/5.0)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc

2013-04-25 Thread Niels Basjes (JIRA)
Niels Basjes created PIG-3297:
-

 Summary: Avro files with stringType set to String cannot be read 
by the AvroStorage LoadFunc
 Key: PIG-3297
 URL: https://issues.apache.org/jira/browse/PIG-3297
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11.1
Reporter: Niels Basjes


When an Avro file is created there exists the option to set the "String Type" 
to a different class than the default Utf8.
A very common situation is that the "String Type" is set to the default String 
class.

When trying to read such an Avro file in Pig using the AvroStorage LoadFunc 
from the included piggybank this gives the following Exception:

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.avro.util.Utf8
at 
org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira