[jira] Subscription: PIG patch available

2017-05-07 Thread jira
Issue Subscription
Filter: PIG patch available (41 issues)

Subscriber: pigdaily

Key Summary
PIG-5228Orc_2 is failing with spark exec type
https://issues.apache.org/jira/browse/PIG-5228
PIG-5225Several unit tests are not annotated with @Test
https://issues.apache.org/jira/browse/PIG-5225
PIG-5218Jyhton_Checkin_3 fails with spark exec type
https://issues.apache.org/jira/browse/PIG-5218
PIG-5207BugFix e2e tests fail on spark
https://issues.apache.org/jira/browse/PIG-5207
PIG-5199exclude jline in spark dependency
https://issues.apache.org/jira/browse/PIG-5199
PIG-5194HiveUDF fails with Spark exec type
https://issues.apache.org/jira/browse/PIG-5194
PIG-5186Support aggregate warnings with Spark engine
https://issues.apache.org/jira/browse/PIG-5186
PIG-5185Job name show "DefaultJobName" when running a Python script
https://issues.apache.org/jira/browse/PIG-5185
PIG-5184set command to view value of a variable
https://issues.apache.org/jira/browse/PIG-5184
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5135HDFS bytes read stats are always 0 in Spark mode
https://issues.apache.org/jira/browse/PIG-5135
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4748DateTimeWritable forgets Chronology
https://issues.apache.org/jira/browse/PIG-4748
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rol

[jira] [Updated] (PIG-5215) Merge changes from review board to spark branch

2017-05-07 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-5215:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: PIG-4059)

> Merge changes from review board to spark branch
> ---
>
> Key: PIG-5215
> URL: https://issues.apache.org/jira/browse/PIG-5215
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5215.1.patch, PIG-5215.3.patch, PIG-5215.patch
>
>
> in [review board|https://reviews.apache.org/r/57317/], there are comments 
> from community. After the review board is close, merge these changes to spark 
> branch



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5218) Jyhton_Checkin_3 fails with spark exec type

2017-05-07 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000247#comment-16000247
 ] 

liyunzhang_intel commented on PIG-5218:
---

[~rohini]: commit to the branch. [~szita]: thanks for contribution.

> Jyhton_Checkin_3 fails with spark exec type
> ---
>
> Key: PIG-5218
> URL: https://issues.apache.org/jira/browse/PIG-5218
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5218.0.patch, PIG-5218.1.patch
>
>
> Exception observed:
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.commons.logging.impl.SLF4JLocationAwareLog cannot be cast to 
> org.apache.commons.logging.impl.Log4JLogger
> at 
> org.apache.hadoop.test.GenericTestUtils.setLogLevel(GenericTestUtils.java:107)
> at 
> org.apache.hadoop.fs.FileContextCreateMkdirBaseTest.(FileContextCreateMkdirBaseTest.java:60)
> ... 29 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5228) Orc_2 is failing with spark exec type

2017-05-07 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000233#comment-16000233
 ] 

Rohini Palaniswamy commented on PIG-5228:
-

bq.  I think it may be dependent on HashMap implementation and thus JDK version 
as well.
   If running on different JDK version, it could be an issue as HashMap 
implementation changed between 1.6 and 1.7.  For the same jdk version behavior 
of insertion is usually consistent.

bq. My wild guess is that either MR or Spark makes an extra filling of a map 
somewhere under the hood and that's where the difference comes from.
  Ordering of entries in map is not something we guarantee. But can you still 
try to find why it is happening? I am surprised you are running into this with 
just simple load and store statements for same jdk version. Could be something 
to do with serialization as well. 
  
bq. so I've created a new a test case where we project by each key from the map.
   If figuring out the cause of change is taking more time, I am fine with the 
current patch which has separate test for Spark.

> Orc_2 is failing with spark exec type
> -
>
> Key: PIG-5228
> URL: https://issues.apache.org/jira/browse/PIG-5228
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5228.0.patch
>
>
> This test is failing due to mismatch in the actual and expected result. The 
> difference is only related to the order of entries in Pig maps such as:
> Actual:
> {code}
> [name#alice, age#18]...
> {code}
> Expected:
> {code}
> [age#18, name#alice]...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5218) Jyhton_Checkin_3 fails with spark exec type

2017-05-07 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000222#comment-16000222
 ] 

Rohini Palaniswamy commented on PIG-5218:
-

+1. [~kellyzly], can you commit this into the spark branch?

> Jyhton_Checkin_3 fails with spark exec type
> ---
>
> Key: PIG-5218
> URL: https://issues.apache.org/jira/browse/PIG-5218
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5218.0.patch, PIG-5218.1.patch
>
>
> Exception observed:
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.commons.logging.impl.SLF4JLocationAwareLog cannot be cast to 
> org.apache.commons.logging.impl.Log4JLogger
> at 
> org.apache.hadoop.test.GenericTestUtils.setLogLevel(GenericTestUtils.java:107)
> at 
> org.apache.hadoop.fs.FileContextCreateMkdirBaseTest.(FileContextCreateMkdirBaseTest.java:60)
> ... 29 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (PIG-5197) Replace IndexedKey with PigNullableWritable in spark branch

2017-05-07 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-5197.
-
Resolution: Won't Fix

Don't have a suggestion to workaround that. Closing it as Won't Fix for now. If 
there are performance and/or comparator issues, we will revisit.

> Replace IndexedKey with PigNullableWritable in spark branch
> ---
>
> Key: PIG-5197
> URL: https://issues.apache.org/jira/browse/PIG-5197
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5197.patch
>
>
> The function of IndexedKey and PigNullableWritable is similar. 
> The difference between these two is  IndexedKey contains Index,key while 
> PigNullableWritable contains index,key,value.
> Besides,the comparators for PigNullableWritable have lot of conditions for 
> the different data types taken care of and IndexedKey can miss some of that. 
> We can try to replace IndexedKey with PigNullableWritable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)