date:20160204

[jira] [Commented] (PIG-4784) Enable "pig.disable.counter“ for spark engine

2016-02-04 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133777#comment-15133777
 ] 

liyunzhang_intel commented on PIG-4784:
---

[~pallavi.rao]: thanks for your review.
[~xuefuz]： please merge it to spark branch

> Enable "pig.disable.counter“ for spark engine
> -
>
> Key: PIG-4784
> URL: https://issues.apache.org/jira/browse/PIG-4784
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4784.patch, PIG-4784_2.patch
>
>
> When you enable pig.disable.counter as "true" in the conf/pig.properties, the 
> counter to calculate the number of input records  and output records will be 
> disabled. 
> Following unit tests are designed to test it but now they fail:
> org.apache.pig.test.TestPigRunner#testDisablePigCounters
> org.apache.pig.test.TestPigRunner#testDisablePigCounters2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4784) Enable "pig.disable.counter“ for spark engine

2016-02-04 Thread Pallavi Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133776#comment-15133776
 ] 

Pallavi Rao commented on PIG-4784:
--

+1 for the latest patch.

> Enable "pig.disable.counter“ for spark engine
> -
>
> Key: PIG-4784
> URL: https://issues.apache.org/jira/browse/PIG-4784
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4784.patch, PIG-4784_2.patch
>
>
> When you enable pig.disable.counter as "true" in the conf/pig.properties, the 
> counter to calculate the number of input records  and output records will be 
> disabled. 
> Following unit tests are designed to test it but now they fail:
> org.apache.pig.test.TestPigRunner#testDisablePigCounters
> org.apache.pig.test.TestPigRunner#testDisablePigCounters2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] Subscription: PIG patch available

2016-02-04 Thread jira

Issue Subscription
Filter: PIG patch available (28 issues)

Subscriber: pigdaily

Key Summary
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4734TOMAP schema inferring breaks some scripts in type checking for 
bincond
https://issues.apache.org/jira/browse/PIG-4734
PIG-4690Union with self replicate join will fail in Tez
https://issues.apache.org/jira/browse/PIG-4690
PIG-4686Backend code should not call AvroStorageUtils.getPaths
https://issues.apache.org/jira/browse/PIG-4686
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384

[jira] [Resolved] (PIG-4793) AvroStorage issues during write into HDFS

2016-02-04 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-4793.
-
Resolution: Invalid

> AvroStorage issues during write into HDFS
> -
>
> Key: PIG-4793
> URL: https://issues.apache.org/jira/browse/PIG-4793
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: John Smith
>
> Dear,
> I created the simple pig script that reads two avro files, merges the two 
> relations and stores it into the output avro file.
> I tried to store output relation into avro file using:
>  store outputSet into 'avrostorage' using AvroStorage();
> Some workaround was required because pig has problems to process schema with 
> :: (maybe another bug?)
> Added code below the result 'avrostorage' file was generated.
> outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as 
> (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), 
> $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);
>  
> When I tried to store avro file with the schema definition using code below,
> strange error is occurring https://bpaste.net/show/ccf0cbef06a9 (Full log).
> ...
> 10.0.1.47:8050 2016-01-29 17:24:39,600 [main] ERROR 
> org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) 
> failed!
> ...
> STORE outputSet INTO '/avro-dest/Test-20160129-1401822' 
>  USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check', 
> 'schema', '')
> Sample data and pig script:
> https://drive.google.com/file/d/0B6RZ_9vVuTEcd01aWm9zczNUUWc/view
> I think these might be two important issues, could you please investigate?
> Thank you



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4795) Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream

2016-02-04 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4795:

   Resolution: Fixed
 Assignee: emopers
 Hadoop Flags: Reviewed
Fix Version/s: 0.16.0
   Status: Resolved  (was: Patch Available)

I don't see any downside.

Patch committed to trunk. Thanks [~emopers]

> Flushing ObjectOutputStream before calling toByteArray on the underlying 
> ByteArrayOutputStream
> --
>
> Key: PIG-4795
> URL: https://issues.apache.org/jira/browse/PIG-4795
> Project: Pig
>  Issue Type: Bug
>Reporter: emopers
>Assignee: emopers
>Priority: Minor
>  Labels: easyfix, patch
> Fix For: 0.16.0
>
> Attachments: PIG-4795-0.patch
>
>
> In PigSplit.java
> {code}
> private void writeObject(Serializable obj, DataOutput os)
> throws IOException {
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> ObjectOutputStream oos = new ObjectOutputStream(baos);
> oos.writeObject(obj);
> byte[] bytes = baos.toByteArray();
> os.writeInt(bytes.length);
> os.write(bytes);
> }
> {code}
> When an ObjectOutputStream instance wraps an underlying ByteArrayOutputStream 
> instance,
> it is recommended to flush or close the ObjectOutputStream before invoking 
> the underlying instances's toByteArray(). Also, it is a good practice to call 
> flush/close explicitly as mentioned for example at 
> http://stackoverflow.com/questions/2984538/how-to-use-bytearrayoutputstream-and-dataoutputstream-simultaneously-java.
> The patch adds a flush method before calling toByteArray().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4784) Enable "pig.disable.counter“ for spark engine

2016-02-04 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4784:
--
Attachment: PIG-4784_2.patch

[~pallavi.rao]:PIG-4794_2.patch for your last review.

> Enable "pig.disable.counter“ for spark engine
> -
>
> Key: PIG-4784
> URL: https://issues.apache.org/jira/browse/PIG-4784
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4784.patch, PIG-4784_2.patch
>
>
> When you enable pig.disable.counter as "true" in the conf/pig.properties, the 
> counter to calculate the number of input records  and output records will be 
> disabled. 
> Following unit tests are designed to test it but now they fail:
> org.apache.pig.test.TestPigRunner#testDisablePigCounters
> org.apache.pig.test.TestPigRunner#testDisablePigCounters2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread Pallavi Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133655#comment-15133655
 ] 

Pallavi Rao commented on PIG-4766:
--

Thanks [~xuefuz]. Thanks [~kellyzly] for the review.

> Ensure GroupBy is optimized for all algebraic Operations
> 
>
> Key: PIG-4766
> URL: https://issues.apache.org/jira/browse/PIG-4766
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4766-v1.patch, PIG-4766-v2.patch, PIG-4766.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-4766:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Pallavi!

> Ensure GroupBy is optimized for all algebraic Operations
> 
>
> Key: PIG-4766
> URL: https://issues.apache.org/jira/browse/PIG-4766
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4766-v1.patch, PIG-4766-v2.patch, PIG-4766.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread Pallavi Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Rao updated PIG-4766:
-
Attachment: PIG-4766-v2.patch

Rebased patch.

> Ensure GroupBy is optimized for all algebraic Operations
> 
>
> Key: PIG-4766
> URL: https://issues.apache.org/jira/browse/PIG-4766
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4766-v1.patch, PIG-4766-v2.patch, PIG-4766.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread Pallavi Rao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43044/
---

(Updated Feb. 5, 2016, 4:25 a.m.)


Review request for pig, Xianda Ke, liyun zhang, Mohit Sabharwal, and Xuefu 
Zhang.


Changes
---

Rebased patch


Bugs: PIG-4766
https://issues.apache.org/jira/browse/PIG-4766


Repository: pig-git


Description
---

PIG-4709 introduced Combiner optimization for Group By. However, the patch did 
not handle cases where constant/conditional expressions were used. It also did 
not handle limit.

This patch is to address those gaps.


Diffs (updated)
-

  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java
 5fb49e2 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java
 d4b521a 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java
 a05d009 
  
src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
 5c0919f 
  
test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java 
0e45434 
  test/org/apache/pig/test/TestCombiner.java b2e81ac 

Diff: https://reviews.apache.org/r/43044/diff/


Testing
---

With this patch, all tests in TestCombiner pass.


Thanks,

Pallavi Rao

[jira] [Created] (PIG-4797) Analyze JOIN performance and improve the same.

2016-02-04 Thread Pallavi Rao (JIRA)

Pallavi Rao created PIG-4797:


 Summary: Analyze JOIN performance and improve the same.
 Key: PIG-4797
 URL: https://issues.apache.org/jira/browse/PIG-4797
 Project: Pig
  Issue Type: Improvement
  Components: spark
Reporter: Pallavi Rao
Assignee: Pallavi Rao






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4795) Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream

2016-02-04 Thread emopers (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133469#comment-15133469
 ] 

emopers commented on PIG-4795:
--

No, I don't see any specific error but as I said it is just a good practice to 
add flush(), to be on safe side.


> Flushing ObjectOutputStream before calling toByteArray on the underlying 
> ByteArrayOutputStream
> --
>
> Key: PIG-4795
> URL: https://issues.apache.org/jira/browse/PIG-4795
> Project: Pig
>  Issue Type: Bug
>Reporter: emopers
>Priority: Minor
>  Labels: easyfix, patch
> Attachments: PIG-4795-0.patch
>
>
> In PigSplit.java
> {code}
> private void writeObject(Serializable obj, DataOutput os)
> throws IOException {
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> ObjectOutputStream oos = new ObjectOutputStream(baos);
> oos.writeObject(obj);
> byte[] bytes = baos.toByteArray();
> os.writeInt(bytes.length);
> os.write(bytes);
> }
> {code}
> When an ObjectOutputStream instance wraps an underlying ByteArrayOutputStream 
> instance,
> it is recommended to flush or close the ObjectOutputStream before invoking 
> the underlying instances's toByteArray(). Also, it is a good practice to call 
> flush/close explicitly as mentioned for example at 
> http://stackoverflow.com/questions/2984538/how-to-use-bytearrayoutputstream-and-dataoutputstream-simultaneously-java.
> The patch adds a flush method before calling toByteArray().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133470#comment-15133470
 ] 

liyunzhang_intel commented on PIG-4766:
---

[~pallavi.rao]: +1.   Need to repatch because some codes are in conflict with 
check in of PIG-4611.

> Ensure GroupBy is optimized for all algebraic Operations
> 
>
> Key: PIG-4766
> URL: https://issues.apache.org/jira/browse/PIG-4766
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4766-v1.patch, PIG-4766.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4795) Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream

2016-02-04 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133445#comment-15133445
 ] 

Daniel Dai commented on PIG-4795:
-

Do you see a specific error in your environment?

> Flushing ObjectOutputStream before calling toByteArray on the underlying 
> ByteArrayOutputStream
> --
>
> Key: PIG-4795
> URL: https://issues.apache.org/jira/browse/PIG-4795
> Project: Pig
>  Issue Type: Bug
>Reporter: emopers
>Priority: Minor
>  Labels: easyfix, patch
> Attachments: PIG-4795-0.patch
>
>
> In PigSplit.java
> {code}
> private void writeObject(Serializable obj, DataOutput os)
> throws IOException {
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> ObjectOutputStream oos = new ObjectOutputStream(baos);
> oos.writeObject(obj);
> byte[] bytes = baos.toByteArray();
> os.writeInt(bytes.length);
> os.write(bytes);
> }
> {code}
> When an ObjectOutputStream instance wraps an underlying ByteArrayOutputStream 
> instance,
> it is recommended to flush or close the ObjectOutputStream before invoking 
> the underlying instances's toByteArray(). Also, it is a good practice to call 
> flush/close explicitly as mentioned for example at 
> http://stackoverflow.com/questions/2984538/how-to-use-bytearrayoutputstream-and-dataoutputstream-simultaneously-java.
> The patch adds a flush method before calling toByteArray().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4789) Pig on TEZ creates wrong result with replicated join

2016-02-04 Thread Krzysztof Indyk (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132821#comment-15132821
 ] 

Krzysztof Indyk commented on PIG-4789:
--

It looks like it's same bug as in https://issues.apache.org/jira/browse/PIG-4695

> Pig on TEZ creates wrong result with replicated join
> 
>
> Key: PIG-4789
> URL: https://issues.apache.org/jira/browse/PIG-4789
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.15.0
>Reporter: Michael Prim
>Priority: Critical
> Attachments: tez_bug.pig, tez_bug_input1.csv, tez_bug_input2.csv, 
> tez_bug_input3.csv
>
>
> Please find below a minimal example of a Pig script that uses splits and 
> replicated joins and where the output differs between MapReduce and TEZ as 
> execution engine. The attachment also contains the sample input data.
> The expected output, as created by MapReduce engine is:
> {code}
> (id1,123,A,)
> (id2,234,,B)
> (id3,456,,)
> (id4,567,A,)
> {code}
> whereas TEZ produces
> {code}
> (id1,123,A,A)
> (id2,234,B,B)
> (id3,456,,)
> (id4,567,A,A)
> {code}
> Removing the {{USING 'replicated'}} and using a regular join yields correct 
> results. I am not sure if this is a Pig issue or a TEZ issue. However, as 
> this issue silently can lead to data corruption I rated it critical. So far 
> searching didn't indicate a similar bug or anybody being aware of it.
> {code}
> classdata = LOAD '/tez_bug_input1.csv' USING PigStorage(',') AS 
> (classid:chararray, class:chararray);
> data = LOAD '/tez_bug_input2.csv' USING PigStorage(',') AS 
> (eventid:chararray, classid:chararray);
> basedata = LOAD '/tez_bug_input3.csv' USING PigStorage(',') AS 
> (eventid:chararray, foo:int);
> dataJclassdata = JOIN classdata BY classid, data BY classid;
> SPLIT dataJclassdata INTO classA IF class == 'A', classB IF class == 'B';
> dataA = JOIN basedata BY eventid LEFT OUTER, classA BY data::eventid USING 
> 'replicated';
> dataA = foreach dataA generate basedata::eventid as eventid
>   , basedata::foo as foo
>   , classA::classdata::class as classA;
> dataB = JOIN dataA BY eventid LEFT OUTER, classB BY eventid USING 
> 'replicated';
> dataB = foreach dataB generate dataA::eventid as eventid
>   , dataA::foo as foo
>   , dataA::classA as classA
> , classB::classdata::class as classB;
> DUMP dataB;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4793) AvroStorage issues during write into HDFS

2016-02-04 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132785#comment-15132785
 ] 

Daniel Dai commented on PIG-4793:
-

The one in the builtin is newer and is introduced in PIG-3015. Yes, we shall 
deprecate the one in piggybank.

> AvroStorage issues during write into HDFS
> -
>
> Key: PIG-4793
> URL: https://issues.apache.org/jira/browse/PIG-4793
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: John Smith
>
> Dear,
> I created the simple pig script that reads two avro files, merges the two 
> relations and stores it into the output avro file.
> I tried to store output relation into avro file using:
>  store outputSet into 'avrostorage' using AvroStorage();
> Some workaround was required because pig has problems to process schema with 
> :: (maybe another bug?)
> Added code below the result 'avrostorage' file was generated.
> outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as 
> (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), 
> $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);
>  
> When I tried to store avro file with the schema definition using code below,
> strange error is occurring https://bpaste.net/show/ccf0cbef06a9 (Full log).
> ...
> 10.0.1.47:8050 2016-01-29 17:24:39,600 [main] ERROR 
> org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) 
> failed!
> ...
> STORE outputSet INTO '/avro-dest/Test-20160129-1401822' 
>  USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check', 
> 'schema', '')
> Sample data and pig script:
> https://drive.google.com/file/d/0B6RZ_9vVuTEcd01aWm9zczNUUWc/view
> I think these might be two important issues, could you please investigate?
> Thank you



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4789) Pig on TEZ creates wrong result with replicated join

2016-02-04 Thread Michael Prim (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132483#comment-15132483
 ] 

Michael Prim commented on PIG-4789:
---

Is there some ETA for a new 0.16.0 or 0.15.1 release soon, which would include 
this fixes?

Also if you have an idea which patch fixes this, we could think of 
cherry-picking it, running trunk in production is unfortunately no option.

> Pig on TEZ creates wrong result with replicated join
> 
>
> Key: PIG-4789
> URL: https://issues.apache.org/jira/browse/PIG-4789
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.15.0
>Reporter: Michael Prim
>Priority: Critical
> Attachments: tez_bug.pig, tez_bug_input1.csv, tez_bug_input2.csv, 
> tez_bug_input3.csv
>
>
> Please find below a minimal example of a Pig script that uses splits and 
> replicated joins and where the output differs between MapReduce and TEZ as 
> execution engine. The attachment also contains the sample input data.
> The expected output, as created by MapReduce engine is:
> {code}
> (id1,123,A,)
> (id2,234,,B)
> (id3,456,,)
> (id4,567,A,)
> {code}
> whereas TEZ produces
> {code}
> (id1,123,A,A)
> (id2,234,B,B)
> (id3,456,,)
> (id4,567,A,A)
> {code}
> Removing the {{USING 'replicated'}} and using a regular join yields correct 
> results. I am not sure if this is a Pig issue or a TEZ issue. However, as 
> this issue silently can lead to data corruption I rated it critical. So far 
> searching didn't indicate a similar bug or anybody being aware of it.
> {code}
> classdata = LOAD '/tez_bug_input1.csv' USING PigStorage(',') AS 
> (classid:chararray, class:chararray);
> data = LOAD '/tez_bug_input2.csv' USING PigStorage(',') AS 
> (eventid:chararray, classid:chararray);
> basedata = LOAD '/tez_bug_input3.csv' USING PigStorage(',') AS 
> (eventid:chararray, foo:int);
> dataJclassdata = JOIN classdata BY classid, data BY classid;
> SPLIT dataJclassdata INTO classA IF class == 'A', classB IF class == 'B';
> dataA = JOIN basedata BY eventid LEFT OUTER, classA BY data::eventid USING 
> 'replicated';
> dataA = foreach dataA generate basedata::eventid as eventid
>   , basedata::foo as foo
>   , classA::classdata::class as classA;
> dataB = JOIN dataA BY eventid LEFT OUTER, classB BY eventid USING 
> 'replicated';
> dataB = foreach dataB generate dataA::eventid as eventid
>   , dataA::foo as foo
>   , dataA::classA as classA
> , classB::classdata::class as classB;
> DUMP dataB;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4789) Pig on TEZ creates wrong result with replicated join

2016-02-04 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132478#comment-15132478
 ] 

Rohini Palaniswamy commented on PIG-4789:
-

Tested it and trunk returns the right results, but I am not sure which of the 
jiras fixed this issue as I can't remember fixing something for this particular 
case.

> Pig on TEZ creates wrong result with replicated join
> 
>
> Key: PIG-4789
> URL: https://issues.apache.org/jira/browse/PIG-4789
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.15.0
>Reporter: Michael Prim
>Priority: Critical
> Attachments: tez_bug.pig, tez_bug_input1.csv, tez_bug_input2.csv, 
> tez_bug_input3.csv
>
>
> Please find below a minimal example of a Pig script that uses splits and 
> replicated joins and where the output differs between MapReduce and TEZ as 
> execution engine. The attachment also contains the sample input data.
> The expected output, as created by MapReduce engine is:
> {code}
> (id1,123,A,)
> (id2,234,,B)
> (id3,456,,)
> (id4,567,A,)
> {code}
> whereas TEZ produces
> {code}
> (id1,123,A,A)
> (id2,234,B,B)
> (id3,456,,)
> (id4,567,A,A)
> {code}
> Removing the {{USING 'replicated'}} and using a regular join yields correct 
> results. I am not sure if this is a Pig issue or a TEZ issue. However, as 
> this issue silently can lead to data corruption I rated it critical. So far 
> searching didn't indicate a similar bug or anybody being aware of it.
> {code}
> classdata = LOAD '/tez_bug_input1.csv' USING PigStorage(',') AS 
> (classid:chararray, class:chararray);
> data = LOAD '/tez_bug_input2.csv' USING PigStorage(',') AS 
> (eventid:chararray, classid:chararray);
> basedata = LOAD '/tez_bug_input3.csv' USING PigStorage(',') AS 
> (eventid:chararray, foo:int);
> dataJclassdata = JOIN classdata BY classid, data BY classid;
> SPLIT dataJclassdata INTO classA IF class == 'A', classB IF class == 'B';
> dataA = JOIN basedata BY eventid LEFT OUTER, classA BY data::eventid USING 
> 'replicated';
> dataA = foreach dataA generate basedata::eventid as eventid
>   , basedata::foo as foo
>   , classA::classdata::class as classA;
> dataB = JOIN dataA BY eventid LEFT OUTER, classB BY eventid USING 
> 'replicated';
> dataB = foreach dataB generate dataA::eventid as eventid
>   , dataA::foo as foo
>   , dataA::classA as classA
> , classB::classdata::class as classB;
> DUMP dataB;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4789) Pig on TEZ creates wrong result with replicated join

2016-02-04 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132465#comment-15132465
 ] 

Rohini Palaniswamy commented on PIG-4789:
-

Can you test if you still get wrong results with the current trunk? Couple of 
fixes have gone in. 

> Pig on TEZ creates wrong result with replicated join
> 
>
> Key: PIG-4789
> URL: https://issues.apache.org/jira/browse/PIG-4789
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.15.0
>Reporter: Michael Prim
>Priority: Critical
> Attachments: tez_bug.pig, tez_bug_input1.csv, tez_bug_input2.csv, 
> tez_bug_input3.csv
>
>
> Please find below a minimal example of a Pig script that uses splits and 
> replicated joins and where the output differs between MapReduce and TEZ as 
> execution engine. The attachment also contains the sample input data.
> The expected output, as created by MapReduce engine is:
> {code}
> (id1,123,A,)
> (id2,234,,B)
> (id3,456,,)
> (id4,567,A,)
> {code}
> whereas TEZ produces
> {code}
> (id1,123,A,A)
> (id2,234,B,B)
> (id3,456,,)
> (id4,567,A,A)
> {code}
> Removing the {{USING 'replicated'}} and using a regular join yields correct 
> results. I am not sure if this is a Pig issue or a TEZ issue. However, as 
> this issue silently can lead to data corruption I rated it critical. So far 
> searching didn't indicate a similar bug or anybody being aware of it.
> {code}
> classdata = LOAD '/tez_bug_input1.csv' USING PigStorage(',') AS 
> (classid:chararray, class:chararray);
> data = LOAD '/tez_bug_input2.csv' USING PigStorage(',') AS 
> (eventid:chararray, classid:chararray);
> basedata = LOAD '/tez_bug_input3.csv' USING PigStorage(',') AS 
> (eventid:chararray, foo:int);
> dataJclassdata = JOIN classdata BY classid, data BY classid;
> SPLIT dataJclassdata INTO classA IF class == 'A', classB IF class == 'B';
> dataA = JOIN basedata BY eventid LEFT OUTER, classA BY data::eventid USING 
> 'replicated';
> dataA = foreach dataA generate basedata::eventid as eventid
>   , basedata::foo as foo
>   , classA::classdata::class as classA;
> dataB = JOIN dataA BY eventid LEFT OUTER, classB BY eventid USING 
> 'replicated';
> dataB = foreach dataB generate dataA::eventid as eventid
>   , dataA::foo as foo
>   , dataA::classA as classA
> , classB::classdata::class as classB;
> DUMP dataB;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4793) AvroStorage issues during write into HDFS

2016-02-04 Thread John Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132091#comment-15132091
 ] 

John Smith commented on PIG-4793:
-

well in that case i dont understand why there is another function 
org.apache.pig.piggybank.storage.avro.AvroStorage() which has also different 
parameters, for example AvroStorage() doest accept parameters as 
'no_schema_check', 'schema', 

> AvroStorage issues during write into HDFS
> -
>
> Key: PIG-4793
> URL: https://issues.apache.org/jira/browse/PIG-4793
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: John Smith
>
> Dear,
> I created the simple pig script that reads two avro files, merges the two 
> relations and stores it into the output avro file.
> I tried to store output relation into avro file using:
>  store outputSet into 'avrostorage' using AvroStorage();
> Some workaround was required because pig has problems to process schema with 
> :: (maybe another bug?)
> Added code below the result 'avrostorage' file was generated.
> outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as 
> (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), 
> $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);
>  
> When I tried to store avro file with the schema definition using code below,
> strange error is occurring https://bpaste.net/show/ccf0cbef06a9 (Full log).
> ...
> 10.0.1.47:8050 2016-01-29 17:24:39,600 [main] ERROR 
> org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) 
> failed!
> ...
> STORE outputSet INTO '/avro-dest/Test-20160129-1401822' 
>  USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check', 
> 'schema', '')
> Sample data and pig script:
> https://drive.google.com/file/d/0B6RZ_9vVuTEcd01aWm9zczNUUWc/view
> I think these might be two important issues, could you please investigate?
> Thank you



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread Pallavi Rao



> On Feb. 4, 2016, 8:56 a.m., kelly zhang wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java,
> >  line 181
> > 
> >
> > can we consider all the tuples with null key are same?
> > 
> > I explain the detail in jira page.

Answered your question on the JIRA :-)


> On Feb. 4, 2016, 8:56 a.m., kelly zhang wrote:
> > src/org/apache/pig/data/SelfSpillBag.java, line 55
> > 
> >
> > This modification is checked in PIG-4611.

Oh! I missed the change. Will revert this change. Thanks for pointing out.


- Pallavi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43044/#review117779
---


On Feb. 3, 2016, 6:23 a.m., Pallavi Rao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43044/
> ---
> 
> (Updated Feb. 3, 2016, 6:23 a.m.)
> 
> 
> Review request for pig, Xianda Ke, liyun zhang, Mohit Sabharwal, and Xuefu 
> Zhang.
> 
> 
> Bugs: PIG-4766
> https://issues.apache.org/jira/browse/PIG-4766
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> PIG-4709 introduced Combiner optimization for Group By. However, the patch 
> did not handle cases where constant/conditional expressions were used. It 
> also did not handle limit.
> 
> This patch is to address those gaps.
> 
> 
> Diffs
> -
> 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java
>  5fb49e2 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java
>  d4b521a 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java
>  a05d009 
>   
> src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
>  5c0919f 
>   src/org/apache/pig/data/SelfSpillBag.java 4e08b99 
>   
> test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java
>  0e45434 
>   test/org/apache/pig/test/TestCombiner.java b2e81ac 
> 
> Diff: https://reviews.apache.org/r/43044/diff/
> 
> 
> Testing
> ---
> 
> With this patch, all tests in TestCombiner pass.
> 
> 
> Thanks,
> 
> Pallavi Rao
> 
>

[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread Pallavi Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132024#comment-15132024
 ] 

Pallavi Rao commented on PIG-4766:
--

[~kellyzly], yes, for group by, all tuples will "null" key values will be 
grouped together. See TestLocalRearrange.testMultiQueryJiraPig1194 for an 
example PIG and the expected output.

> Ensure GroupBy is optimized for all algebraic Operations
> 
>
> Key: PIG-4766
> URL: https://issues.apache.org/jira/browse/PIG-4766
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4766-v1.patch, PIG-4766.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4796) Authenticate with Kerberos using a keytab file

2016-02-04 Thread Niels Basjes (JIRA)

Niels Basjes created PIG-4796:
-

 Summary: Authenticate with Kerberos using a keytab file
 Key: PIG-4796
 URL: https://issues.apache.org/jira/browse/PIG-4796
 Project: Pig
  Issue Type: New Feature
Reporter: Niels Basjes


When running in a Kerberos secured environment users are faced with the 
limitation that their jobs cannot run longer than the (remaining) ticket 
lifetime of their Kerberos tickets. The environment I work in these tickets 
expire after 10 hours, thus limiting the maximum job duration to at most 10 
hours (which is a problem).

In the Hadoop tooling there is a feature where you can authenticate using a 
Kerberos keytab file (essentially a file that contains the encrypted form of 
the kerberos principal and password). Using this the running application can 
request new tickets from the Kerberos server when the initial tickets expire.

In my Java/Hadoop applications I commonly include these two lines:
{code}
System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
UserGroupInformation.loginUserFromKeytab("nbas...@xx.net", 
"/home/nbasjes/.krb/nbasjes.keytab");
{code}

This way I have run an Apache Flink based application for more than 170 hours 
(about a week) on the kerberos secured Yarn cluster.

What I propose is to have a feature that I can set the relevant kerberos values 
in my pig script and from there be able to run a pig job for many days on the 
secured cluster.

Proposal how this can look in a pig script:
{code}
SET java.security.krb5.conf '/etc/krb5.conf'
SET job.security.krb5.principal 'nbas...@xx.net'
SET job.security.krb5.keytab '/home/nbasjes/.krb/nbasjes.keytab'
{code}
So iff all of these are set (or at least the last two) then the aforementioned  
UserGroupInformation.loginUserFromKeytab method is called before submitting the 
job to the cluster.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread kelly zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43044/#review117779
---




src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java
 (line 181)


can we consider all the tuples with null key are same?

I explain the detail in jira page.



src/org/apache/pig/data/SelfSpillBag.java (line 55)


This modification is checked in PIG-4611.


- kelly zhang


On Feb. 3, 2016, 6:23 a.m., Pallavi Rao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43044/
> ---
> 
> (Updated Feb. 3, 2016, 6:23 a.m.)
> 
> 
> Review request for pig, Xianda Ke, liyun zhang, Mohit Sabharwal, and Xuefu 
> Zhang.
> 
> 
> Bugs: PIG-4766
> https://issues.apache.org/jira/browse/PIG-4766
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> PIG-4709 introduced Combiner optimization for Group By. However, the patch 
> did not handle cases where constant/conditional expressions were used. It 
> also did not handle limit.
> 
> This patch is to address those gaps.
> 
> 
> Diffs
> -
> 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java
>  5fb49e2 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java
>  d4b521a 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java
>  a05d009 
>   
> src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
>  5c0919f 
>   src/org/apache/pig/data/SelfSpillBag.java 4e08b99 
>   
> test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java
>  0e45434 
>   test/org/apache/pig/test/TestCombiner.java b2e81ac 
> 
> Diff: https://reviews.apache.org/r/43044/diff/
> 
> 
> Testing
> ---
> 
> With this patch, all tests in TestCombiner pass.
> 
> 
> Thanks,
> 
> Pallavi Rao
> 
>

[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

2016-02-04 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131978#comment-15131978
 ] 

liyunzhang_intel commented on PIG-4766:
---

[~pallavi.rao]:  PIG-4766-1.patch looks good except following problem.
org.apache.pig.backend.hadoop.executionengine.spark.converter.ReduceByConverter.MergeValuesFunction
{code}
   public Tuple apply(Tuple v1, Tuple v2) {
LOG.debug("MergeValuesFunction in : " + v1 + " , " + v2);
Tuple result = tf.newTuple(2);
DataBag bag = DefaultBagFactory.getInstance().newDefaultBag();
Tuple t = new DefaultTuple();
try {
// Package the input tuples so they can be processed by 
Algebraic functions.
Object key = v1.get(0);
if (key == null) {
key = "";
} else {
result.set(0, key);
}
   
{code}
Is it ok that tuples with null key are considered as same?  for example:  two 
tuples  (,20) and (,20), they will be considered to have the same key and 
execute  poReduce.getNext().
 


> Ensure GroupBy is optimized for all algebraic Operations
> 
>
> Key: PIG-4766
> URL: https://issues.apache.org/jira/browse/PIG-4766
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4766-v1.patch, PIG-4766.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4793) AvroStorage issues during write into HDFS

2016-02-04 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131896#comment-15131896
 ] 

Daniel Dai commented on PIG-4793:
-

[http://pig.apache.org/docs/r0.15.0/api/org/apache/pig/builtin/AvroStorage.html#AvroStorage(java.lang.String,%20java.lang.String)]:
-doublecolons Option to translate Pig schema names with double colons to names 
with double underscores (default is false).

> AvroStorage issues during write into HDFS
> -
>
> Key: PIG-4793
> URL: https://issues.apache.org/jira/browse/PIG-4793
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: John Smith
>
> Dear,
> I created the simple pig script that reads two avro files, merges the two 
> relations and stores it into the output avro file.
> I tried to store output relation into avro file using:
>  store outputSet into 'avrostorage' using AvroStorage();
> Some workaround was required because pig has problems to process schema with 
> :: (maybe another bug?)
> Added code below the result 'avrostorage' file was generated.
> outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as 
> (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), 
> $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);
>  
> When I tried to store avro file with the schema definition using code below,
> strange error is occurring https://bpaste.net/show/ccf0cbef06a9 (Full log).
> ...
> 10.0.1.47:8050 2016-01-29 17:24:39,600 [main] ERROR 
> org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) 
> failed!
> ...
> STORE outputSet INTO '/avro-dest/Test-20160129-1401822' 
>  USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check', 
> 'schema', '')
> Sample data and pig script:
> https://drive.google.com/file/d/0B6RZ_9vVuTEcd01aWm9zczNUUWc/view
> I think these might be two important issues, could you please investigate?
> Thank you



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4784) Enable "pig.disable.counter“ for spark engine

[jira] [Commented] (PIG-4784) Enable "pig.disable.counter“ for spark engine

[jira] Subscription: PIG patch available

[jira] [Resolved] (PIG-4793) AvroStorage issues during write into HDFS

[jira] [Updated] (PIG-4795) Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream

[jira] [Updated] (PIG-4784) Enable "pig.disable.counter“ for spark engine

[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

[jira] [Updated] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

[jira] [Updated] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

[jira] [Created] (PIG-4797) Analyze JOIN performance and improve the same.

[jira] [Commented] (PIG-4795) Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream

[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

[jira] [Commented] (PIG-4795) Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream

[jira] [Commented] (PIG-4789) Pig on TEZ creates wrong result with replicated join

[jira] [Commented] (PIG-4793) AvroStorage issues during write into HDFS

[jira] [Commented] (PIG-4789) Pig on TEZ creates wrong result with replicated join

[jira] [Commented] (PIG-4789) Pig on TEZ creates wrong result with replicated join

[jira] [Commented] (PIG-4789) Pig on TEZ creates wrong result with replicated join

[jira] [Commented] (PIG-4793) AvroStorage issues during write into HDFS

Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

[jira] [Created] (PIG-4796) Authenticate with Kerberos using a keytab file

Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all algebraic Operations

[jira] [Commented] (PIG-4793) AvroStorage issues during write into HDFS

26 matches

Site Navigation

Mail list logo

Footer information