[jira] Subscription: PIG patch available

2017-03-30 Thread jira
Issue Subscription
Filter: PIG patch available (41 issues)

Subscriber: pigdaily

Key Summary
PIG-5203Partitioner E2E test fails on spark
https://issues.apache.org/jira/browse/PIG-5203
PIG-5199exclude jline in spark dependency
https://issues.apache.org/jira/browse/PIG-5199
PIG-5194HiveUDF fails with Spark exec type
https://issues.apache.org/jira/browse/PIG-5194
PIG-5186Support aggregate warnings with Spark engine
https://issues.apache.org/jira/browse/PIG-5186
PIG-5185Job name show "DefaultJobName" when running a Python script
https://issues.apache.org/jira/browse/PIG-5185
PIG-5184set command to view value of a variable
https://issues.apache.org/jira/browse/PIG-5184
PIG-5176Several ComputeSpec test cases fail
https://issues.apache.org/jira/browse/PIG-5176
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5153Change of behavior in FLATTEN(map)
https://issues.apache.org/jira/browse/PIG-5153
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4854Merge spark branch to trunk
https://issues.apache.org/jira/browse/PIG-4854
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4788the value BytesRead metric info always returns 0 even the length of 
input file is not 0 in spark engine
https://issues.apache.org/jira/browse/PIG-4788
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4748DateTimeWritable forgets Chronology
https://issues.apache.org/jira/browse/PIG-4748
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-358

Jenkins build became unstable: Pig-trunk-commit #2468

2017-03-30 Thread Apache Jenkins Server
See 




[jira] [Created] (PIG-5205) Duplicate record key info in GlobalRearrangeConverter#ToGroupKeyValueFunction

2017-03-30 Thread liyunzhang_intel (JIRA)
liyunzhang_intel created PIG-5205:
-

 Summary: Duplicate record key info in 
GlobalRearrangeConverter#ToGroupKeyValueFunction
 Key: PIG-5205
 URL: https://issues.apache.org/jira/browse/PIG-5205
 Project: Pig
  Issue Type: Sub-task
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel


in 
org.apache.pig.backend.hadoop.executionengine.spark.converter.GlobalRearrangeConverter.ToGroupKeyValueFunction
{code}

   @Override
public Tuple call(Tuple2>> input) {
try {
   
List> tupleIterators = new 
ArrayList>();
for (int j = 0; j < bags.length; j ++) {
Seq bag = bags[j];
Iterator iterator = JavaConversions
.asJavaCollection(bag).iterator();
final int index = i;
tupleIterators.add(new IteratorTransform(
iterator) {
@Override
protected Tuple transform(Tuple next) {
try {
Tuple tuple = tf.newTuple(3);
tuple.set(0, index);
   # we record duplicate key info here
#for every records, we will use   out.set(0, 
key) later. may be the key info can be removed 
 tuple.set(1, key);   
tuple.set(2, next);
return tuple;
} catch (ExecException e) {
throw new RuntimeException(e);
}
}
});
++ i;
}

Tuple out = tf.newTuple(2);
out.set(0, key);
out.set(1, new IteratorUnion(tupleIterators.iterator()));
if (LOG.isDebugEnabled()) {
LOG.debug("ToGroupKeyValueFunction out " + out);
}

return out;
} catch (Exception e) {
throw new RuntimeException(e);
}
}

{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PIG-5199) exclude jline in spark dependency

2017-03-30 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-5199:
--
Status: Patch Available  (was: Open)

> exclude jline in spark dependency
> -
>
> Key: PIG-5199
> URL: https://issues.apache.org/jira/browse/PIG-5199
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-5199.patch
>
>
> when i fix PIG-5197 and run TestGrunt, the exception is thrown
> {code}
> [ERROR] Terminal initialization failed; falling back to unsupported$
> 4220 java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but 
> interface was expected$
> 4221 ^Iat jline.TerminalFactory.create(TerminalFactory.java:101)$
> 4222 ^Iat jline.TerminalFactory.get(TerminalFactory.java:159)$
> 4223 ^Iat jline.console.ConsoleReader.(ConsoleReader.java:227)$
> 4224 ^Iat jline.console.ConsoleReader.(ConsoleReader.java:219)$
> 4225 ^Iat jline.console.ConsoleReader.(ConsoleReader.java:211)$
> 4226 ^Iat org.apache.pig.Main.run(Main.java:554)$
> 4227 ^Iat org.apache.pig.PigRunner.run(PigRunner.java:49)$
> 4228 ^Iat org.apache.pig.test.TestGrunt.testGruntUtf8(TestGrunt.java:1579)$
> 4229 ^Iat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)$
> 4230 ^Iat 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)$
> 4231 ^Iat 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)$
> 4232 ^Iat java.lang.reflect.Method.invoke(Method.java:498)$
> 4233 ^Iat 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)$
> 4234 ^Iat 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)$
> 4235 ^Iat 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)$
> 4236 ^Iat 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)$
> 4237 ^Iat 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)$
> 4238 ^Iat org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)$
> 4239 ^Iat 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)$
> 4240 ^Iat 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)$
> 4241 ^Iat org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)$
> 4242 ^Iat org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)$
> 4243 ^Iat org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)$
> 4244 ^Iat org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)$
> {code}
> i found this is because there are 2 jars about jline in different version
> {code}
> find -name jline*jar
> ./build/ivy/lib/spark/jline-0.9.94.jar
> ./build/ivy/lib/Pig/jline-2.11.jar
> ./lib/spark/jline-0.9.94.jar
> ./lib/jline-2.11.jar
> {code}
> we need to exclude jline-0.9.94 from spark dependency.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5197) Replace IndexedKey with PigNullableWritable in spark branch

2017-03-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949920#comment-15949920
 ] 

Rohini Palaniswamy commented on PIG-5197:
-

Why not make PigNullableWritable implement Serializable?

> Replace IndexedKey with PigNullableWritable in spark branch
> ---
>
> Key: PIG-5197
> URL: https://issues.apache.org/jira/browse/PIG-5197
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
> Fix For: spark-branch
>
>
> The function of IndexedKey and PigNullableWritable is similar. 
> The difference between these two is  IndexedKey contains Index,key while 
> PigNullableWritable contains index,key,value.
> Besides,the comparators for PigNullableWritable have lot of conditions for 
> the different data types taken care of and IndexedKey can miss some of that. 
> We can try to replace IndexedKey with PigNullableWritable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5197) Replace IndexedKey with PigNullableWritable in spark branch

2017-03-30 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949885#comment-15949885
 ] 

liyunzhang_intel commented on PIG-5197:
---

[~rohini]:  I tried to replace IndexedKey wih PigBNullableWritable but it 
failed because PigNullableWritable is not serializable. So i will remain 
IndexedKey in spark package.  Can you give me some suggestion?


exception info
{code}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0.0 in stage 32.0 mpl.io.NullableTuple
Serialization stack:
- object not serializable (class: org.apache.pig.impl.io.NullableTuple, 
value: Null: false in
[

{code}



> Replace IndexedKey with PigNullableWritable in spark branch
> ---
>
> Key: PIG-5197
> URL: https://issues.apache.org/jira/browse/PIG-5197
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
> Fix For: spark-branch
>
>
> The function of IndexedKey and PigNullableWritable is similar. 
> The difference between these two is  IndexedKey contains Index,key while 
> PigNullableWritable contains index,key,value.
> Besides,the comparators for PigNullableWritable have lot of conditions for 
> the different data types taken care of and IndexedKey can miss some of that. 
> We can try to replace IndexedKey with PigNullableWritable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PIG-4677) Display failure information on stop on failure

2017-03-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4677:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the review Daniel.

> Display failure information on stop on failure
> --
>
> Key: PIG-4677
> URL: https://issues.apache.org/jira/browse/PIG-4677
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Mit Desai
>Assignee: Rohini Palaniswamy
> Fix For: 0.17.0
>
> Attachments: PIG-4677.2.patch, PIG-4677.3.patch, PIG-4677.4.patch, 
> PIG-4677-5.patch, PIG-4677.patch
>
>
> When stop on failure option is specified, pig abruptly exits without 
> displaying any job stats or failed job information which it usually does in 
> case of failures.
> {code}
> 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 9% complete
> 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Running jobs are 
> [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756]
> 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR 
> org.apache.pig.tools.grunt.Grunt  - ERROR 6017: Job failed!
> Hadoop Job IDs executed by Pig: 
> job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754
> <<< Invocation of Main class completed <<<
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5201) Null handling on FLATTEN

2017-03-30 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949177#comment-15949177
 ] 

Koji Noguchi commented on PIG-5201:
---

[~daijy], appreciate your feedback on this when you have time.

> Null handling on FLATTEN
> 
>
> Key: PIG-5201
> URL: https://issues.apache.org/jira/browse/PIG-5201
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5201-v00-testonly.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect 
> results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5176) Several ComputeSpec test cases fail

2017-03-30 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948768#comment-15948768
 ] 

Nandor Kollar commented on PIG-5176:


Let's leave this open for now, I'll verify, but I'm afraid it is still an issue 
on my cluster. It might be something related the to the Spark version I use on 
my cluster, this needs more investigation.

> Several ComputeSpec test cases fail
> ---
>
> Key: PIG-5176
> URL: https://issues.apache.org/jira/browse/PIG-5176
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: spark-branch
>
> Attachments: PIG-5176.patch
>
>
> Several ComputeSpec test cases failed on my cluster:
> ComputeSpec_5 - ComputeSpec_13
> These scripts have a ship() part in the define, where the ship includes the 
> script file too, so we add the same file to spark context twice. This is not 
> a problem with Hadoop, but looks like Spark doesn't like adding the same 
> filename twice:
> {code}
> Caused by: java.lang.IllegalArgumentException: requirement failed: File 
> PigStreamingDepend.pl already registered.
> at scala.Predef$.require(Predef.scala:233)
> at 
> org.apache.spark.rpc.netty.NettyStreamManager.addFile(NettyStreamManager.scala:69)
> at org.apache.spark.SparkContext.addFile(SparkContext.scala:1386)
> at org.apache.spark.SparkContext.addFile(SparkContext.scala:1348)
> at 
> org.apache.spark.api.java.JavaSparkContext.addFile(JavaSparkContext.scala:662)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addResourceToSparkJobWorkingDirectory(SparkLauncher.java:462)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.shipFiles(SparkLauncher.java:371)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addFilesToSparkJob(SparkLauncher.java:357)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.uploadResources(SparkLauncher.java:235)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:222)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)