[jira] Subscription: PIG patch available

2014-11-12 Thread jira
Issue Subscription
Filter: PIG patch available (20 issues)

Subscriber: pigdaily

Key Summary
PIG-4329Fetch optimization should be disabled when limit is not pushed up
https://issues.apache.org/jira/browse/PIG-4329
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4264Port TestAvroStorage to tez local mode
https://issues.apache.org/jira/browse/PIG-4264
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4239"pig.output.lazy" not works in spark mode
https://issues.apache.org/jira/browse/PIG-4239
PIG-4207Make python udfs work with Spark
https://issues.apache.org/jira/browse/PIG-4207
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4103Fix TestRegisteredJarVisibility(after PIG-4083)
https://issues.apache.org/jira/browse/PIG-4103
PIG-4066An optimization for ROLLUP operation in Pig
https://issues.apache.org/jira/browse/PIG-4066
PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce
https://issues.apache.org/jira/browse/PIG-4004
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384


[jira] [Updated] (PIG-4329) Fetch optimization should be disabled when limit is not pushed up

2014-11-12 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4329:
---
Status: Patch Available  (was: Open)

> Fetch optimization should be disabled when limit is not pushed up
> -
>
> Key: PIG-4329
> URL: https://issues.apache.org/jira/browse/PIG-4329
> Project: Pig
>  Issue Type: Bug
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.15.0
>
> Attachments: PIG-4329-1.patch
>
>
> Although PIG-4135 disable fetch optimization when there is no limit in the 
> plan, that doesn't solve the problem completely. In fact, fetch optimization 
> should be still disabled if limit is not pushed up. Consider the following 
> query-
> {code}
> random_lists = load 'prodhive.schakraborty.search_server_denorm_impressions' 
> using DseStorage();
> random_lists = filter random_lists by entity_section=='random';
> random_lists = limit random_lists 10;
> dump random_lists;
> {code}
> Because the {{filter by}} blocks limit from being pushed up, POLoad actually 
> scans the full table. In this case, fetch optimization makes the job 
> extremely slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4329) Fetch optimization should be disabled when limit is not pushed up

2014-11-12 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4329:
---
Attachment: PIG-4329-1.patch

Uploading a patch that disables fetch optimization when limit is not pushed up.

> Fetch optimization should be disabled when limit is not pushed up
> -
>
> Key: PIG-4329
> URL: https://issues.apache.org/jira/browse/PIG-4329
> Project: Pig
>  Issue Type: Bug
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.15.0
>
> Attachments: PIG-4329-1.patch
>
>
> Although PIG-4135 disable fetch optimization when there is no limit in the 
> plan, that doesn't solve the problem completely. In fact, fetch optimization 
> should be still disabled if limit is not pushed up. Consider the following 
> query-
> {code}
> random_lists = load 'prodhive.schakraborty.search_server_denorm_impressions' 
> using DseStorage();
> random_lists = filter random_lists by entity_section=='random';
> random_lists = limit random_lists 10;
> dump random_lists;
> {code}
> Because the {{filter by}} blocks limit from being pushed up, POLoad actually 
> scans the full table. In this case, fetch optimization makes the job 
> extremely slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4329) Fetch optimization should be disabled when limit is not pushed up

2014-11-12 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4329:
---
Description: 
Although PIG-4135 disable fetch optimization when there is no limit in the 
plan, that doesn't solve the problem completely. In fact, fetch optimization 
should be still disabled if limit is not pushed up. Consider the following 
query-
{code}
random_lists = load 'prodhive.schakraborty.search_server_denorm_impressions' 
using DseStorage();
random_lists = filter random_lists by entity_section=='random';
random_lists = limit random_lists 10;
dump random_lists;
{code}
Because the {{filter by}} blocks limit from being pushed up, POLoad actually 
scans the full table. In this case, fetch optimization makes the job extremely 
slow.

  was:
Although PIG-4135 disable fetch optimization when there is no limit in the 
plan, that doesn't solve the problem completely. In fact, fetch optimization 
should be still disabled if limit is not pushed up. Consider the following 
query-
{code}
random_lists = load 'prodhive.schakraborty.search_server_denorm_impressions' 
using DseStorage();
random_lists = filter random_lists by entity_section=='random');
random_lists = limit random_lists 10;
dump random_lists;
{code}
Because the {{filter by}} blocks limit from being pushed up, POLoad actually 
scans the full table. In this case, fetch optimization makes the job extremely 
slow.


> Fetch optimization should be disabled when limit is not pushed up
> -
>
> Key: PIG-4329
> URL: https://issues.apache.org/jira/browse/PIG-4329
> Project: Pig
>  Issue Type: Bug
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.15.0
>
>
> Although PIG-4135 disable fetch optimization when there is no limit in the 
> plan, that doesn't solve the problem completely. In fact, fetch optimization 
> should be still disabled if limit is not pushed up. Consider the following 
> query-
> {code}
> random_lists = load 'prodhive.schakraborty.search_server_denorm_impressions' 
> using DseStorage();
> random_lists = filter random_lists by entity_section=='random';
> random_lists = limit random_lists 10;
> dump random_lists;
> {code}
> Because the {{filter by}} blocks limit from being pushed up, POLoad actually 
> scans the full table. In this case, fetch optimization makes the job 
> extremely slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4329) Fetch optimization should be disabled when limit is not pushed up

2014-11-12 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-4329:
--

 Summary: Fetch optimization should be disabled when limit is not 
pushed up
 Key: PIG-4329
 URL: https://issues.apache.org/jira/browse/PIG-4329
 Project: Pig
  Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.15.0


Although PIG-4135 disable fetch optimization when there is no limit in the 
plan, that doesn't solve the problem completely. In fact, fetch optimization 
should be still disabled if limit is not pushed up. Consider the following 
query-
{code}
random_lists = load 'prodhive.schakraborty.search_server_denorm_impressions' 
using DseStorage();
random_lists = filter random_lists by entity_section=='random');
random_lists = limit random_lists 10;
dump random_lists;
{code}
Because the {{filter by}} blocks limit from being pushed up, POLoad actually 
scans the full table. In this case, fetch optimization makes the job extremely 
slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] Release Pig 0.14.0 (candidate 0)

2014-11-12 Thread Daniel Dai
Hi,

I have created a candidate build for Pig 0.14.0.

Keys used to sign the release are available at
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.

Please download, test, and try it out:
http://people.apache.org/~daijy/pig-0.14.0-candidate-0/

Release notes and the rat report are available at the same location.

Should we release this? Vote closes on next Monday EOD, Nov 17th 2014.

Thanks,
Daniel

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (PIG-4321) Documentation for 0.14

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4321:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.14 branch. Thanks Rohini for review!

> Documentation for 0.14
> --
>
> Key: PIG-4321
> URL: https://issues.apache.org/jira/browse/PIG-4321
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4321-1.patch, PIG-4321-2.patch, PIG-4321-3.patch, 
> PIG-4321-4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27938: Documentation for 0.14

2014-11-12 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27938/#review61194
---

Ship it!


Ship It!

- Rohini Palaniswamy


On Nov. 13, 2014, 3:02 a.m., Daniel Dai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27938/
> ---
> 
> (Updated Nov. 13, 2014, 3:02 a.m.)
> 
> 
> Review request for pig and Rohini Palaniswamy.
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> See PIG-4321
> 
> 
> Diffs
> -
> 
>   trunk/src/docs/src/documentation/content/xdocs/cont.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/func.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/perf.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/start.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/tabs.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/test.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/udf.xml 1637955 
> 
> Diff: https://reviews.apache.org/r/27938/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Daniel Dai
> 
>



[jira] [Commented] (PIG-4321) Documentation for 0.14

2014-11-12 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209175#comment-14209175
 ] 

Rohini Palaniswamy commented on PIG-4321:
-

+1

> Documentation for 0.14
> --
>
> Key: PIG-4321
> URL: https://issues.apache.org/jira/browse/PIG-4321
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4321-1.patch, PIG-4321-2.patch, PIG-4321-3.patch, 
> PIG-4321-4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4321) Documentation for 0.14

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4321:

Attachment: PIG-4321-4.patch

Revision based on Rohini's review comments.

> Documentation for 0.14
> --
>
> Key: PIG-4321
> URL: https://issues.apache.org/jira/browse/PIG-4321
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4321-1.patch, PIG-4321-2.patch, PIG-4321-3.patch, 
> PIG-4321-4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27938: Documentation for 0.14

2014-11-12 Thread Daniel Dai


> On Nov. 13, 2014, 2:06 a.m., Rohini Palaniswamy wrote:
> > trunk/src/docs/src/documentation/content/xdocs/start.xml, lines 48-49
> > 
> >
> > Why remove these?

Pig do ship those jars in lib, no need to download separately


- Daniel


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27938/#review61140
---


On Nov. 12, 2014, 11:49 p.m., Daniel Dai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27938/
> ---
> 
> (Updated Nov. 12, 2014, 11:49 p.m.)
> 
> 
> Review request for pig and Rohini Palaniswamy.
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> See PIG-4321
> 
> 
> Diffs
> -
> 
>   trunk/src/docs/src/documentation/content/xdocs/cont.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/func.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/perf.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/start.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/tabs.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/test.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/udf.xml 1637955 
> 
> Diff: https://reviews.apache.org/r/27938/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Daniel Dai
> 
>



Re: Review Request 27938: Documentation for 0.14

2014-11-12 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27938/
---

(Updated Nov. 13, 2014, 3:02 a.m.)


Review request for pig and Rohini Palaniswamy.


Repository: pig


Description
---

See PIG-4321


Diffs (updated)
-

  trunk/src/docs/src/documentation/content/xdocs/cont.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/func.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/perf.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/start.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/tabs.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/test.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/udf.xml 1637955 

Diff: https://reviews.apache.org/r/27938/diff/


Testing
---


Thanks,

Daniel Dai



Re: Review Request 27938: Documentation for 0.14

2014-11-12 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27938/#review61140
---



trunk/src/docs/src/documentation/content/xdocs/func.xml


Loads from or stores data to



trunk/src/docs/src/documentation/content/xdocs/func.xml


current options are only applicable with STORE operation and not for LOAD. 
(Just to make it more explicit)



trunk/src/docs/src/documentation/content/xdocs/func.xml


In internal documentation, I had the default values added. Pasting that 
below. Could we add them here as well?

-s, --stripeSize Set the stripe size for the file. Default is 
268435456(256 MB).
-r, --rowIndexStride Set the distance between entries in the row index. 
Default is 1.
-b, --bufferSize The size of the memory buffers used for compressing 
and storing the stripe in memory. Default is 262144 (256K).
-p, --blockPadding Sets whether the HDFS blocks are padded to prevent 
stripes from straddling blocks. Default is true.
-c, --compress Sets the generic compression that is used to compress 
the data. Valid codecs are: NONE, ZLIB, SNAPPY, LZO. Default is ZLIB.



trunk/src/docs/src/documentation/content/xdocs/func.xml


primary data types but non of the complex data file -> primitive data types 
but none of the complex data types



trunk/src/docs/src/documentation/content/xdocs/func.xml


each record in the alias (Little confusing when we mention job or vertex)



trunk/src/docs/src/documentation/content/xdocs/perf.xml


simply putting -> simply add



trunk/src/docs/src/documentation/content/xdocs/perf.xml


Prerequisite: Tez requires the tez tarball to be available in hdfs while 
running a job on the cluster and a tez-site.xml with tez.lib.uris setting 
pointing to that hdfs location in classpath. Copy the tez tarball to hdfs and 
add the tez conf directory($TEZ_HOME/conf) containing tez-site.xml to 
environmental variable "PIG_CLASSPATH" if pig on tez fails with "tez.lib.uris 
is not defined". This is required by the Apache Pig distribution.



trunk/src/docs/src/documentation/content/xdocs/perf.xml


eges -> edges



trunk/src/docs/src/documentation/content/xdocs/perf.xml


be sure to implement a cleanup function and register with https://reviews.apache.org/r/27938/#comment102693>

Pig will obey it  -> Pig will honor it



trunk/src/docs/src/documentation/content/xdocs/start.xml


Why remove these?



trunk/src/docs/src/documentation/content/xdocs/start.xml


some queries error out or hang in ...

There are some queries which just error out on bigger data in local mode.


- Rohini Palaniswamy


On Nov. 12, 2014, 11:49 p.m., Daniel Dai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27938/
> ---
> 
> (Updated Nov. 12, 2014, 11:49 p.m.)
> 
> 
> Review request for pig and Rohini Palaniswamy.
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> See PIG-4321
> 
> 
> Diffs
> -
> 
>   trunk/src/docs/src/documentation/content/xdocs/cont.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/func.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/perf.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/start.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/tabs.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/test.xml 1637955 
>   trunk/src/docs/src/documentation/content/xdocs/udf.xml 1637955 
> 
> Diff: https://reviews.apache.org/r/27938/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Daniel Dai
> 
>



[jira] [Updated] (PIG-4328) Upgrade Hive to 0.14

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4328:

Issue Type: Improvement  (was: Bug)

> Upgrade Hive to 0.14
> 
>
> Key: PIG-4328
> URL: https://issues.apache.org/jira/browse/PIG-4328
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4328-1.patch
>
>
> Hive 0.14.0 artifacts are available. We shall switch to use the released 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4328) Upgrade Hive to 0.14

2014-11-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209090#comment-14209090
 ] 

Thejas M Nair commented on PIG-4328:


+1

> Upgrade Hive to 0.14
> 
>
> Key: PIG-4328
> URL: https://issues.apache.org/jira/browse/PIG-4328
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4328-1.patch
>
>
> Hive 0.14.0 artifacts are available. We shall switch to use the released 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4328) Upgrade Hive to 0.14

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4328:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to both trunk and 0.14 branch. Thanks Thejas!

> Upgrade Hive to 0.14
> 
>
> Key: PIG-4328
> URL: https://issues.apache.org/jira/browse/PIG-4328
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4328-1.patch
>
>
> Hive 0.14.0 artifacts are available. We shall switch to use the released 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4325) StackOverflow when spilling InternalCachedBag

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4325:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.14 branch. Thanks Rohini for review!

> StackOverflow when spilling InternalCachedBag
> -
>
> Key: PIG-4325
> URL: https://issues.apache.org/jira/browse/PIG-4325
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4325-1.patch
>
>
> See the following stack:
> {code}
> exceptionThrown=java.lang.StackOverflowError
>   at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:121)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
> ..
> {code}
> Pig made recursive call in InternalCachedBag.hashCode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4321) Documentation for 0.14

2014-11-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208925#comment-14208925
 ] 

Daniel Dai commented on PIG-4321:
-

RB link: https://reviews.apache.org/r/27938/

> Documentation for 0.14
> --
>
> Key: PIG-4321
> URL: https://issues.apache.org/jira/browse/PIG-4321
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4321-1.patch, PIG-4321-2.patch, PIG-4321-3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 27938: Documentation for 0.14

2014-11-12 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27938/
---

Review request for pig and Rohini Palaniswamy.


Repository: pig


Description
---

See PIG-4321


Diffs
-

  trunk/src/docs/src/documentation/content/xdocs/cont.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/func.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/perf.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/start.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/tabs.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/test.xml 1637955 
  trunk/src/docs/src/documentation/content/xdocs/udf.xml 1637955 

Diff: https://reviews.apache.org/r/27938/diff/


Testing
---


Thanks,

Daniel Dai



[jira] [Updated] (PIG-4328) Upgrade Hive to 0.14

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4328:

Status: Patch Available  (was: Open)

> Upgrade Hive to 0.14
> 
>
> Key: PIG-4328
> URL: https://issues.apache.org/jira/browse/PIG-4328
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4328-1.patch
>
>
> Hive 0.14.0 artifacts are available. We shall switch to use the released 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4328) Upgrade Hive to 0.14

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4328:

Attachment: PIG-4328-1.patch

> Upgrade Hive to 0.14
> 
>
> Key: PIG-4328
> URL: https://issues.apache.org/jira/browse/PIG-4328
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4328-1.patch
>
>
> Hive 0.14.0 artifacts are available. We shall switch to use the released 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4328) Upgrade Hive to 0.14

2014-11-12 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4328:
---

 Summary: Upgrade Hive to 0.14
 Key: PIG-4328
 URL: https://issues.apache.org/jira/browse/PIG-4328
 Project: Pig
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0


Hive 0.14.0 artifacts are available. We shall switch to use the released 
version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4325) StackOverflow when spilling InternalCachedBag

2014-11-12 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208917#comment-14208917
 ] 

Rohini Palaniswamy commented on PIG-4325:
-

+1. 

> StackOverflow when spilling InternalCachedBag
> -
>
> Key: PIG-4325
> URL: https://issues.apache.org/jira/browse/PIG-4325
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4325-1.patch
>
>
> See the following stack:
> {code}
> exceptionThrown=java.lang.StackOverflowError
>   at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:121)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
> ..
> {code}
> Pig made recursive call in InternalCachedBag.hashCode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4321) Documentation for 0.14

2014-11-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208900#comment-14208900
 ] 

Daniel Dai commented on PIG-4321:
-

The compiled doc can be found here: 
http://people.apache.org/~daijy/pig-0.14.0/doc/

> Documentation for 0.14
> --
>
> Key: PIG-4321
> URL: https://issues.apache.org/jira/browse/PIG-4321
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4321-1.patch, PIG-4321-2.patch, PIG-4321-3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4325) StackOverflow when spilling InternalCachedBag

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4325:

Status: Patch Available  (was: Open)

All tests pass.

> StackOverflow when spilling InternalCachedBag
> -
>
> Key: PIG-4325
> URL: https://issues.apache.org/jira/browse/PIG-4325
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4325-1.patch
>
>
> See the following stack:
> {code}
> exceptionThrown=java.lang.StackOverflowError
>   at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:121)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:72)
>   at 
> org.apache.pig.data.DefaultAbstractBag.incSpillCount(DefaultAbstractBag.java:446)
>   at 
> org.apache.pig.data.InternalCachedBag.updateSpillRecCounter(InternalCachedBag.java:114)
>   at 
> org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:129)
>   at 
> org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:158)
>   at 
> org.apache.pig.data.DefaultAbstractBag.hashCode(DefaultAbstractBag.java:363)
>   at java.util.WeakHashMap.hash(WeakHashMap.java:365)
>   at java.util.WeakHashMap.get(WeakHashMap.java:464)
> ..
> {code}
> Pig made recursive call in InternalCachedBag.hashCode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4321) Documentation for 0.14

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4321:

Status: Patch Available  (was: Open)

> Documentation for 0.14
> --
>
> Key: PIG-4321
> URL: https://issues.apache.org/jira/browse/PIG-4321
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4321-1.patch, PIG-4321-2.patch, PIG-4321-3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4321) Documentation for 0.14

2014-11-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4321:

Attachment: PIG-4321-3.patch

PIG-4321-3.patch is the inclusive patch. Ready for review.

> Documentation for 0.14
> --
>
> Key: PIG-4321
> URL: https://issues.apache.org/jira/browse/PIG-4321
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4321-1.patch, PIG-4321-2.patch, PIG-4321-3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3346) New property that controls the number of combined splits

2014-11-12 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208363#comment-14208363
 ] 

Cheolsoo Park commented on PIG-3346:


[~rohini], thank you for your suggestion. I just tried to set 
{{mapreduce.input.fileinputformat.split.maxsize}}, but that didn't help with s3 
files. Few mapper tasks still load too many small files. My patch actually 
limits the # of combined splits and reports it as a counter. This is quite 
helpful to debug slow mappers for me.

> New property that controls the number of combined splits
> 
>
> Key: PIG-3346
> URL: https://issues.apache.org/jira/browse/PIG-3346
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.15.0
>
> Attachments: PIG-3346-2.patch, PIG-3346-3.patch, PIG-3346.patch
>
>
> Currently, the size of combined splits can be configured by the 
> {{pig.maxCombinedSplitSize}} property.
> Although this works fine most of time, it can lead to a undesired situation 
> where a single mapper ends up loading a lot of combined splits. Particularly, 
> this is bad if Pig uploads them from S3.
> So it will be useful if the max number of combined splits can be configured 
> via a property something like {{pig.maxCombinedSplitNum}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4327) Schema of map with value that has an alias can't be parsed again

2014-11-12 Thread Michael Prim (JIRA)
Michael Prim created PIG-4327:
-

 Summary: Schema of map with value that has an alias can't be 
parsed again
 Key: PIG-4327
 URL: https://issues.apache.org/jira/browse/PIG-4327
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.12.0, 0.13.0
Reporter: Michael Prim


Tried to create a map of a primitive type, the resulting schema can't be parsed 
again by the parser if there is a alias set for the value.

I could not set an alias, but the alias gets set by pig itself, e.g. when 
converting avro schemas to pig schemas and there was a map of records in avro.

See also my other bug report https://issues.apache.org/jira/browse/PIG-4326 , 
even without that fix, pig produces schemas of maps with values that have an 
alias.

You can easily reproduce the crash, using those two unit tests. The second one 
should actually succeed but throws a ParserException instead

{code}
@Test
public void testWorksWithoutAlias() throws FrontendException {
List innerFields = new ArrayList<>();
innerFields.add(new FieldSchema(null, DataType.LONG));
List fields = new ArrayList<>();
fields.add(new FieldSchema("mapAlias", new Schema(innerFields), 
DataType.MAP));

Schema inputSchema = new Schema(fields);
Schema fromString = 
Utils.getSchemaFromBagSchemaString(inputSchema.toString());
assertEquals(inputSchema.toString(), fromString.toString());
}

@Test
public void testBreaksWithAlias() throws FrontendException {
List innerFields = new ArrayList<>();
innerFields.add(new FieldSchema("valueAlias", DataType.LONG));
List fields = new ArrayList<>();
fields.add(new FieldSchema("mapAlias", new Schema(innerFields), 
DataType.MAP));

Schema inputSchema = new Schema(fields);
Schema fromString = 
Utils.getSchemaFromBagSchemaString(inputSchema.toString());
assertEquals(inputSchema.toString(), fromString.toString());
}
{code}

I suppose that the issue is in the grammar itself and easy to fix for someone 
knowing antlr. I don't think the issue is related to the actual type of the 
value, as I could also provide tests that fail if we don't use a primitive but 
complex type with an alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4326) AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records

2014-11-12 Thread Michael Prim (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Prim updated PIG-4326:
--
Attachment: mapsOfArraysOfRecords.patch

I attached the patch, including a (breaking) unit test on the current trunk, 
and the fix to make the test work.

> AvroStorageSchemaConversionUtilities does not properly convert schema for 
> maps of arrays of records
> ---
>
> Key: PIG-4326
> URL: https://issues.apache.org/jira/browse/PIG-4326
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Michael Prim
> Attachments: mapsOfArraysOfRecords.patch
>
>
> I tried to convert the avro schema of a map of arrays of records into the 
> proper pig schema and got always empty map schemas in pig.
> The reason is that the AvroStorageSchemaConversionUtilities does only assume 
> records or primitive types as content of the map. However, a map of arrays, 
> or a map of map, could have a schema itself and requires recursive calling to 
> derive the full schema.
> I wrote a unit test to test for maps of arrays of records which fails with 
> every pig release since the AvroStorage was rewritten (I think this was in 
> 0.12), and there have been no changes since then in the trunk. 
> Further the attached patch contains the (rather simple) fix that makes the 
> schema conversion utils succeed.
> Would appreciate further comments and if this can be included upstream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4326) AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records

2014-11-12 Thread Michael Prim (JIRA)
Michael Prim created PIG-4326:
-

 Summary: AvroStorageSchemaConversionUtilities does not properly 
convert schema for maps of arrays of records
 Key: PIG-4326
 URL: https://issues.apache.org/jira/browse/PIG-4326
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0, 0.13.0
Reporter: Michael Prim


I tried to convert the avro schema of a map of arrays of records into the 
proper pig schema and got always empty map schemas in pig.

The reason is that the AvroStorageSchemaConversionUtilities does only assume 
records or primitive types as content of the map. However, a map of arrays, or 
a map of map, could have a schema itself and requires recursive calling to 
derive the full schema.

I wrote a unit test to test for maps of arrays of records which fails with 
every pig release since the AvroStorage was rewritten (I think this was in 
0.12), and there have been no changes since then in the trunk. 

Further the attached patch contains the (rather simple) fix that makes the 
schema conversion utils succeed.

Would appreciate further comments and if this can be included upstream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)