date:20170524

[jira] [Commented] (PIG-5235) Typecast with as-clause fails for tuple/bag with an empty schema

2017-05-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024288#comment-16024288
 ] 

Daniel Dai commented on PIG-5235:
-

+1

> Typecast with as-clause fails for tuple/bag with an empty schema
> 
>
> Key: PIG-5235
> URL: https://issues.apache.org/jira/browse/PIG-5235
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5235-v01.patch
>
>
> Following script fails with trunk(0.17).
> {code}
> a = load 'test.txt' as (mytuple:tuple (), gpa:float);
> b = foreach a generate mytuple as (mytuple2:(name:int, age:double));
> store b into '/tmp/deleteme';
> {code}
> 2017-05-16 09:52:31,280 \[main] ERROR org.apache.pig.tools.grunt.Grunt - 
> ERROR 2999: Unexpected internal error. null
> (This is a continuation from the as-clause fix at PIG-2315 and follow up jira 
> PIG-4933)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PIG-5241) Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java

2017-05-24 Thread liyunzhang_intel (JIRA)

liyunzhang_intel created PIG-5241:
-

 Summary: Specify the hdfs path directly to spark and avoid the 
unnecessary download and upload in SparkLauncher.java
 Key: PIG-5241
 URL: https://issues.apache.org/jira/browse/PIG-5241
 Project: Pig
  Issue Type: Sub-task
Reporter: liyunzhang_intel


//TODO: Specify the hdfs path directly to spark and avoid the unnecessary 
download and upload in SparkLauncher.java
{code}
  private void cacheFiles(String cacheFiles) throws IOException {
if (cacheFiles != null && !cacheFiles.isEmpty()) {
File tmpFolder = Files.createTempDirectory("cache").toFile();
tmpFolder.deleteOnExit();
for (String file : cacheFiles.split(",")) {
String fileName = extractFileName(file.trim());
Path src = new Path(extractFileUrl(file.trim()));
File tmpFile = new File(tmpFolder, fileName);
Path tmpFilePath = new Path(tmpFile.getAbsolutePath());
FileSystem fs = tmpFilePath.getFileSystem(jobConf);
//TODO: Specify the hdfs path directly to spark and avoid the 
unnecessary download and upload in SparkLauncher.java
fs.copyToLocalFile(src, tmpFilePath);
tmpFile.deleteOnExit();
LOG.info(String.format("CacheFile:%s", fileName));
addResourceToSparkJobWorkingDirectory(tmpFile, fileName,
ResourceType.FILE);
}
}
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] Subscription: PIG patch available

2017-05-24 Thread jira

Issue Subscription
Filter: PIG patch available (38 issues)

Subscriber: pigdaily

Key Summary
PIG-5238Fix datetime related test issues after PIG-4748
https://issues.apache.org/jira/browse/PIG-5238
PIG-5236json simple jar not included automatically while trying to load 
multiple schema in pig using avro
https://issues.apache.org/jira/browse/PIG-5236
PIG-5225Several unit tests are not annotated with @Test
https://issues.apache.org/jira/browse/PIG-5225
PIG-5207BugFix e2e tests fail on spark
https://issues.apache.org/jira/browse/PIG-5207
PIG-5194HiveUDF fails with Spark exec type
https://issues.apache.org/jira/browse/PIG-5194
PIG-5184set command to view value of a variable
https://issues.apache.org/jira/browse/PIG-5184
PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown 
NPE in multithread env
https://issues.apache.org/jira/browse/PIG-5160
PIG-5115Builtin AvroStorage generates incorrect avro schema when the same 
pig field name appears in the alias
https://issues.apache.org/jira/browse/PIG-5115
PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive 
set to true
https://issues.apache.org/jira/browse/PIG-5106
PIG-5081Can not run pig on spark source code distribution
https://issues.apache.org/jira/browse/PIG-5081
PIG-5080Support store alias as spark table
https://issues.apache.org/jira/browse/PIG-5080
PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput
https://issues.apache.org/jira/browse/PIG-5057
PIG-5029Optimize sort case when data is skewed
https://issues.apache.org/jira/browse/PIG-5029
PIG-4926Modify the content of start.xml for spark mode
https://issues.apache.org/jira/browse/PIG-4926
PIG-4924Translate failures.maxpercent MR setting to Tez
https://issues.apache.org/jira/browse/PIG-4924
PIG-4913Reduce jython function initiation during compilation
https://issues.apache.org/jira/browse/PIG-4913
PIG-4849pig on tez will cause tez-ui to crash,because the content from 
timeline server is too long. 
https://issues.apache.org/jira/browse/PIG-4849
PIG-4750REPLACE_MULTI should compile Pattern once and reuse it
https://issues.apache.org/jira/browse/PIG-4750
PIG-4700Pig should call ProcessorContext.setProgress() in TezTaskContext
https://issues.apache.org/jira/browse/PIG-4700
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3103make mockito a test dependency (instead of compile)
https://issues.apache.org/jira/browse/PIG-3103
PIG-1804Alow Jython function to implement Algebraic and/or Accumulator 
interfaces
https://issues.apache.org/jira/browse/PIG-1804

You may edit this subscription at:
https:

[jira] [Commented] (PIG-5135) HDFS bytes read stats are always 0 in Spark mode

2017-05-24 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024218#comment-16024218
 ] 

liyunzhang_intel commented on PIG-5135:
---

[~szita]:
bq.I've checked this, it seems that assertEquals(30, 
inputStats.get(0).getBytes()); is fine, but assertEquals(18, 
inputStats.get(1).getBytes()); is not true, Spark returns -1 here. The plan 
generated for spark consists of 4 jobs, last one being the responsible for 
replicated join. This latter does 3 loads, and thus SparkPigStats handle this 
as -1. (Even after adding together all the bytes from all load ops in this job 
I got different result than 18.) I guess compression is also at work here on 
the tmp file part generation that further alters the number of bytes being read.
org.apache.pig.test.TestPigRunner#simpleMultiQueryTest3
{code}
#--
# Spark Plan  
#--

Spark node scope-53
Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
 - scope-54
|
|---A: New For Each(false,false,false)[bag] - scope-10
|   |
|   Cast[int] - scope-2
|   |
|   |---Project[bytearray][0] - scope-1
|   |
|   Cast[int] - scope-5
|   |
|   |---Project[bytearray][1] - scope-4
|   |
|   Cast[int] - scope-8
|   |
|   |---Project[bytearray][2] - scope-7
|
|---A: 
Load(hdfs://localhost:58892/user/root/input:org.apache.pig.builtin.PigStorage) 
- scope-0

Spark node scope-55
Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
 - scope-56
|
|---C: Filter[bag] - scope-14
|   |
|   Less Than or Equal[boolean] - scope-17
|   |
|   |---Project[int][1] - scope-15
|   |
|   |---Constant(5) - scope-16
|

|---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
 - scope-10

Spark node scope-57
C: 
Store(hdfs://localhost:58892/user/root/output:org.apache.pig.builtin.PigStorage)
 - scope-21
|
|---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
 - scope-14

Spark node scope-65
D: 
Store(hdfs://localhost:58892/user/root/output2:org.apache.pig.builtin.PigStorage)
 - scope-52
|
|---D: FRJoinSpark[tuple] - scope-44
|   |
|   Project[int][0] - scope-41
|   |
|   Project[int][0] - scope-42
|   |
|   Project[int][0] - scope-43
|

|---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
 - scope-58
|
|---BroadcastSpark - scope-63
|   |
|   |---B: Filter[bag] - scope-26
|   |   |
|   |   Equal To[boolean] - scope-29
|   |   |
|   |   |---Project[int][0] - scope-27
|   |   |
|   |   |---Constant(3) - scope-28
|   |
|   
|---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
 - scope-60
|
|---BroadcastSpark - scope-64
|
|---A1: New For Each(false,false,false)[bag] - scope-40
|   |
|   Cast[int] - scope-32
|   |
|   |---Project[bytearray][0] - scope-31
|   |
|   Cast[int] - scope-35
|   |
|   |---Project[bytearray][1] - scope-34
|   |
|   Cast[int] - scope-38
|   |
|   |---Project[bytearray][2] - scope-37
|
|---A1: 
Load(hdfs://localhost:58892/user/root/input2:org.apache.pig.builtin.PigStorage) 
- scope-30
{code}
 assertEquals(30, inputStats.get(0).getBytes()) is correct in spark mode,
 assertEquals(18, inputStats.get(1).getBytes()) is wrong in spark mode as the 
there are 3 loads in {{Spark node scope-65}}.  
[{{stats.get("BytesRead")}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L93]
 returns 49( guess this is the sum of 
three loads({{input2}},{{tmp1818797386}},{{tmp-546700946}}). But current 
[{{bytesRead}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L91]
 is -1 because 
[{{singleInput}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L92]
 is false.


Let's modify the code like
{code}

  // Since Tez does has only one load per job its values are correct
// the result of inputStats in spark mode is also correct
  if (!Util.isMapredExecType(cluster.getExecType())) {
assertEquals(30, inputStats.get(0).getBytes());
  }

  //TODO PIG-5240:Fix TestPigRunner#simpleMultiQueryTest3 in spark mode 
for wrong inputStats
  if (!Util.isMapredExecType(cluster.getExecT

[jira] [Created] (PIG-5240) Fix TestPigRunner#simpleMultiQueryTest3 in spark mode for wrong inputStats

2017-05-24 Thread liyunzhang_intel (JIRA)

liyunzhang_intel created PIG-5240:
-

 Summary: Fix TestPigRunner#simpleMultiQueryTest3 in spark mode for 
wrong inputStats
 Key: PIG-5240
 URL: https://issues.apache.org/jira/browse/PIG-5240
 Project: Pig
  Issue Type: Sub-task
Reporter: liyunzhang_intel


in  TestPigRunner#simpleMultiQueryTest3 ,
the explain plan
{code}
#--
# Spark Plan  
#--

Spark node scope-53
Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
 - scope-54
|
|---A: New For Each(false,false,false)[bag] - scope-10
|   |
|   Cast[int] - scope-2
|   |
|   |---Project[bytearray][0] - scope-1
|   |
|   Cast[int] - scope-5
|   |
|   |---Project[bytearray][1] - scope-4
|   |
|   Cast[int] - scope-8
|   |
|   |---Project[bytearray][2] - scope-7
|
|---A: 
Load(hdfs://localhost:58892/user/root/input:org.apache.pig.builtin.PigStorage) 
- scope-0

Spark node scope-55
Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
 - scope-56
|
|---C: Filter[bag] - scope-14
|   |
|   Less Than or Equal[boolean] - scope-17
|   |
|   |---Project[int][1] - scope-15
|   |
|   |---Constant(5) - scope-16
|

|---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
 - scope-10

Spark node scope-57
C: 
Store(hdfs://localhost:58892/user/root/output:org.apache.pig.builtin.PigStorage)
 - scope-21
|
|---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
 - scope-14

Spark node scope-65
D: 
Store(hdfs://localhost:58892/user/root/output2:org.apache.pig.builtin.PigStorage)
 - scope-52
|
|---D: FRJoinSpark[tuple] - scope-44
|   |
|   Project[int][0] - scope-41
|   |
|   Project[int][0] - scope-42
|   |
|   Project[int][0] - scope-43
|

|---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
 - scope-58
|
|---BroadcastSpark - scope-63
|   |
|   |---B: Filter[bag] - scope-26
|   |   |
|   |   Equal To[boolean] - scope-29
|   |   |
|   |   |---Project[int][0] - scope-27
|   |   |
|   |   |---Constant(3) - scope-28
|   |
|   
|---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
 - scope-60
|
|---BroadcastSpark - scope-64
|
|---A1: New For Each(false,false,false)[bag] - scope-40
|   |
|   Cast[int] - scope-32
|   |
|   |---Project[bytearray][0] - scope-31
|   |
|   Cast[int] - scope-35
|   |
|   |---Project[bytearray][1] - scope-34
|   |
|   Cast[int] - scope-38
|   |
|   |---Project[bytearray][2] - scope-37
|
|---A1: 
Load(hdfs://localhost:58892/user/root/input2:org.apache.pig.builtin.PigStorage) 
- scope-30
{code}
 assertEquals(30, inputStats.get(0).getBytes()) is correct in spark mode,
 assertEquals(18, inputStats.get(1).getBytes()) is wrong in spark mode as the 
there are 3 loads in {{Spark node scope-65}}.  
[{{stats.get("BytesRead")}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L93]
 returns 49( guess this is the sum of 
three loads({{input2}},{{tmp1818797386}},{{tmp-546700946}}). But current 
[{{bytesRead}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L91]
 is -1 because 
[{{singleInput}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L92]
 is false.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5239) Investigate why there are duplicated A[3,4] inTestLocationInPhysicalPlan#test in spark mode

2017-05-24 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-5239:
--
Issue Type: Sub-task  (was: Bug)
Parent: PIG-4059

> Investigate why there are duplicated A[3,4] inTestLocationInPhysicalPlan#test 
> in spark mode
> ---
>
> Key: PIG-5239
> URL: https://issues.apache.org/jira/browse/PIG-5239
> Project: Pig
>  Issue Type: Sub-task
>Reporter: liyunzhang_intel
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4924) Translate failures.maxpercent MR setting to Tez

2017-05-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024195#comment-16024195
 ] 

Daniel Dai commented on PIG-4924:
-

+1

> Translate failures.maxpercent MR setting to Tez
> ---
>
> Key: PIG-4924
> URL: https://issues.apache.org/jira/browse/PIG-4924
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.17.0
>
> Attachments: PIG-4924-1.patch
>
>
> TEZ-3271 adds support equivalent to mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. We need to translate that per vertex.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4662) New optimizer rule: filter nulls before inner joins

2017-05-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024192#comment-16024192
 ] 

Daniel Dai commented on PIG-4662:
-

I prefer to do it in optimizer, it seems to be more clear.

> New optimizer rule: filter nulls before inner joins
> ---
>
> Key: PIG-4662
> URL: https://issues.apache.org/jira/browse/PIG-4662
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ido Hadanny
>Assignee: Satish Subhashrao Saley
>Priority: Minor
>  Labels: Performance
> Fix For: 0.18.0
>
>
> As stated in the docs, rewriting an inner join and filtering nulls from 
> inputs can be a big performance gain: 
> http://pig.apache.org/docs/r0.14.0/perf.html#nulls
> We would like to add an optimizer rule which detects inner joins, and filters 
> nulls in all inputs:
> A = filter A by t is not null;
> B = filter B by x is not null;
> C = join A by t, B by x;
> see also: 
> http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4914) Add testcase for join with special characters in chararray

2017-05-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024182#comment-16024182
 ] 

Daniel Dai commented on PIG-4914:
-

This is only for tuple join key, right? String join key of utf8 characters is 
already covered in PIG-4358.

> Add testcase for join with special characters in chararray
> --
>
> Key: PIG-4914
> URL: https://issues.apache.org/jira/browse/PIG-4914
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>   This jira is to add testcase for PIG-4821.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5185) Job name show "DefaultJobName" when running a Python script

2017-05-24 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5185:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Rohini for review!

> Job name show "DefaultJobName" when running a Python script
> ---
>
> Key: PIG-5185
> URL: https://issues.apache.org/jira/browse/PIG-5185
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-5185-1.patch, PIG-5185-2.patch
>
>
> Run a python script with Pig, Hadoop WebUI show "DefaultJobName" instead of 
> script name. We shall use script name, the same semantic for regular Pig 
> script.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PIG-5239) Investigate why there are duplicated A[3,4] inTestLocationInPhysicalPlan#test in spark mode

2017-05-24 Thread liyunzhang_intel (JIRA)

liyunzhang_intel created PIG-5239:
-

 Summary: Investigate why there are duplicated A[3,4] 
inTestLocationInPhysicalPlan#test in spark mode
 Key: PIG-5239
 URL: https://issues.apache.org/jira/browse/PIG-5239
 Project: Pig
  Issue Type: Bug
Reporter: liyunzhang_intel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Build failed in Jenkins: Pig-trunk-commit #2484

2017-05-24 Thread Apache Jenkins Server

See 


Changes:

[daijy] PIG-5185: Job name show "DefaultJobName" when running a Python script

--
[...truncated 172.29 KB...]
A src/org/apache/pig/tools/pigstats/tez/TezDAGStats.java
A src/org/apache/pig/tools/pigstats/tez/TezVertexStats.java
A src/org/apache/pig/tools/pigstats/ScriptState.java
A src/org/apache/pig/tools/pigstats/mapreduce
A src/org/apache/pig/tools/pigstats/mapreduce/MRScriptState.java
A src/org/apache/pig/tools/pigstats/mapreduce/MRJobStats.java
A src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java
A src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java
A src/org/apache/pig/tools/pigstats/PigStatusReporter.java
A src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java
A src/org/apache/pig/tools/pigstats/JobStats.java
A src/org/apache/pig/tools/streams
A src/org/apache/pig/tools/streams/StreamGenerator.java
A src/org/apache/pig/tools/grunt
A src/org/apache/pig/tools/grunt/GruntParser.java
A src/org/apache/pig/tools/grunt/Command.java
A src/org/apache/pig/tools/grunt/Grunt.java
A src/org/apache/pig/tools/grunt/ConsoleReaderInputStream.java
A src/org/apache/pig/tools/grunt/autocomplete
A src/org/apache/pig/tools/grunt/autocomplete_aliases
A src/org/apache/pig/tools/grunt/PigCompletor.java
A src/org/apache/pig/tools/grunt/PigCompletorAliases.java
A src/org/apache/pig/tools/timer
A src/org/apache/pig/tools/timer/PerformanceTimer.java
A src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
A src/org/apache/pig/LoadMetadata.java
A src/org/apache/pig/scripting
A src/org/apache/pig/scripting/Pig.java
A src/org/apache/pig/scripting/SyncProgressNotificationAdaptor.java
A src/org/apache/pig/scripting/groovy
A src/org/apache/pig/scripting/groovy/AccumulatorGetValue.java
A src/org/apache/pig/scripting/groovy/GroovyEvalFuncObject.java
A src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java
A src/org/apache/pig/scripting/groovy/AlgebraicIntermed.java
A src/org/apache/pig/scripting/groovy/AccumulatorAccumulate.java
A src/org/apache/pig/scripting/groovy/AlgebraicFinal.java
A src/org/apache/pig/scripting/groovy/AlgebraicInitial.java
A src/org/apache/pig/scripting/groovy/GroovyAlgebraicEvalFunc.java
A src/org/apache/pig/scripting/groovy/GroovyUtils.java
A src/org/apache/pig/scripting/groovy/AccumulatorCleanup.java
A src/org/apache/pig/scripting/groovy/OutputSchemaFunction.java
A src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java
A src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java
A src/org/apache/pig/scripting/ScriptPigContext.java
A src/org/apache/pig/scripting/ScriptingOutputCapturer.java
A src/org/apache/pig/scripting/streaming
A src/org/apache/pig/scripting/streaming/python
A src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java
A src/org/apache/pig/scripting/ScriptEngine.java
A src/org/apache/pig/scripting/jruby
A src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java
A src/org/apache/pig/scripting/jruby/RubySchema.java
A src/org/apache/pig/scripting/jruby/RubyDataBag.java
A src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java
A src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
A src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
A src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
A src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java
A src/org/apache/pig/scripting/jython
A src/org/apache/pig/scripting/jython/JythonUtils.java
A src/org/apache/pig/scripting/jython/JythonFunction.java
A src/org/apache/pig/scripting/jython/JythonScriptEngine.java
A src/org/apache/pig/scripting/BoundScript.java
A src/org/apache/pig/scripting/js
A src/org/apache/pig/scripting/js/JsFunction.java
A src/org/apache/pig/scripting/js/JsScriptEngine.java
A src/org/apache/pig/scripting/js/JSPig.java
A src/org/apache/pig/CollectableLoadFunc.java
A src/org/apache/pig/ErrorHandling.java
A src/org/apache/pig/StreamToPig.java
A src/org/apache/pig/FilterFunc.java
A src/org/apache/pig/SortColInfo.java
A src/org/apache/pig/pen
A src/org/apache/pig/pen/EquivalenceClasses.java
A src/org/apache/pig/pen/FakeRawKeyValueIterator.java
A src/org/apache/pig/pen/IllustratorAttacher.java
A src/org/apache/pig/pen/LocalMapReduceSimulator.java
A src/org/apache/pig/pen/ExampleGenerator.java
A src/org/

Re: Preparing for Pig 0.17 release

2017-05-24 Thread Ranjana Rajendran

Hi Rohini,
I want to unsubscribe from this email group ?  How do I do that ?

On Wed, May 24, 2017 at 3:34 PM, Rohini Palaniswamy  wrote:

> Hi all,
>We are going to merge the Pig on Spark code by Friday and then branch
> for Pig 0.17 release. Will be moving all jiras marked for 0.17 that are not
> being worked on or not of high priority to 0.18 today. If you consider
> anything important for 0.17, please raise a comment in that jira.
>
> Regards,
> Rohini
>

[jira] [Updated] (PIG-4416) Fix or comment piggybank tests with ExecType.LOCAL

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4416:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Fix or comment piggybank tests with ExecType.LOCAL
> --
>
> Key: PIG-4416
> URL: https://issues.apache.org/jira/browse/PIG-4416
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Mohit Sabharwal
> Fix For: 0.18.0
>
>
> git grep of piggybank unit tests shows several remaining occurrences of 
> ExecType.LOCAL. These need to be fixed or otherwise comment added to 
> indicate why these should not run for Tez, Spark, etc.
>9 TestHiveColumnarLoader
>4 TestHiveColumnarStorage
>3 TestXMLLoader
>3 TestAllLoader
>2 TestAvroStorage
>1 TestSequenceFileLoader
>1 TestRegExLoader
>1 TestMyRegExLoader
>1 TestMultiStorageCompression
>1 TestMultiStorage
>1 TestLoadFuncHelper
>1 TestIndexedStorage
>1 TestHadoopJobHistoryLoader
>1 TestFixedWidthStorer
>1 TestFixedWidthLoader
>1 TestCommonLogLoader
>1 TestCombinedLogLoader
>1 TestCSVStorage
>1 TestCSVExcelStorage



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5220) Improve NestedLimitOptimizer to handle general limit push up

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5220:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Improve NestedLimitOptimizer to handle general limit push up
> 
>
> Key: PIG-5220
> URL: https://issues.apache.org/jira/browse/PIG-5220
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>
> Currently, NestedLimitOptimizer only handles the case limit right after sort. 
> In general, we shall push up limit recursively similar to LimitOptimizer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5076) Pig 0.15.0 cannot STORE the same alias onto HDFS and Mysql both?

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5076:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Pig 0.15.0 cannot STORE the same alias onto HDFS and Mysql both?
> 
>
> Key: PIG-5076
> URL: https://issues.apache.org/jira/browse/PIG-5076
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.15.0
> Environment: Pig 0.15.0; hadoop 2.7.1; 
>Reporter: Joanlynn LIN
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>
> I am using Pig 0.15.0 and have found that maybe it does not support STOREing 
> an alias onto HDFS and Mysql both. the question is simplified as follows:
> first, I have a data file on hdfs://tmp/file, which contains:
>   
>   1046074327,40986
>   1473299786,1
> then, I created a Mysql table db_test, whose schema is:
>   CREATE TABLE `db_test` (
>   `id` bigint(20) NOT NULL,
>   `cnt` bigint(20) NOT NULL
>   ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
> then I have written a Pig script which runs in mapreduce mode on Hadoop 
> 2.7.1, and the script contains:
> REGISTER '/path/to/mysql-connector-java-5.1.38-bin.jar';
>   %declare DBHOST '127.0.0.1'
>   %declare DBPORT '3306'
>   %declare DATABASE 'test'
>   %declare USERNAME 'root'
>   %declare PASSWORD 'toor'
>   a = load '/tmp/file' USING PigStorage(',') AS (id:long, cnt:long);
>   STORE a INTO '/tmp/db_test2' USING PigStorage(',');
>   STORE a INTO 'db_test' USING 
> org.apache.pig.piggybank.storage.DBStorage('com.mysql.jdbc.Driver', 
>   
> 'jdbc:mysql://$DBHOST:$DBPORT/$DATABASE?useUnicode=true&characterEncoding=utf-8',
>   '$USERNAME', '$PASSWORD', 
>   'REPLACE INTO db_test (id, cnt) VALUES (?,?)');
> however, the second STORE will never work, without any error reported. 
> However, if I comment the first STORE line, the second STORE will work! What 
> a magic!
> I have tried to use Pig 0.16.0 in local mode on my own host and it can even 
> not instantiate mysql:
>   Caused by: java.lang.RuntimeException: could not instantiate 
> 'org.apache.pig.piggybank.storage.DBStorage' with arguments 
> '[com.mysql.jdbc.Driver, 
> jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf-8, 
> root, toor, REPLACE INTO db_test (app_id, cnt) VALUES (?,?)]'
>   at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:770)
>   at 
> org.apache.pig.parser.LogicalPlanBuilder.buildStoreOp(LogicalPlanBuilder.java:988)
>   ... 17 more
>   Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:738)
>   ... 18 more
>   Caused by: java.lang.RuntimeException: Can't load DB Driver
>   at org.apache.pig.piggybank.storage.DBStorage.(DBStorage.java:82)
>   at org.apache.pig.piggybank.storage.DBStorage.(DBStorage.java:71)
>   ... 23 more
>   Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:264)
>   at org.apache.pig.piggybank.storage.DBStorage.(DBStorage.java:79)
>   ... 24 more
> The 'instantiate' problem may be due to my environment settings, and I will 
> keep trying.
> And can somebody help me with the 'two STORE' problem? Could it possibly be a 
> bug? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5219) IndexOutOfBoundsException when loading multiple directories with different schemas using OrcStorage

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5219:

Fix Version/s: (was: 0.17.0)
   0.18.0

> IndexOutOfBoundsException when loading multiple directories with different 
> schemas using OrcStorage
> ---
>
> Key: PIG-5219
> URL: https://issues.apache.org/jira/browse/PIG-5219
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
> Environment: Pig Version: 0.16.0
> OS: EMR 5.3.1
>Reporter: Omer Tal
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>
> Scenario:
> # Data set based on two hours in the same day. In hour 00 the ORC file has 4 
> columns {a,b,c,d} and during hour 02 it changes to 5 columns {a,b,c,d,e}
> # Loading ORC files with the same schema (hour 00):
> {code}
> x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> {code}
> # Loading ORC files with different schemas in the same directory:
> {code}
> x = load 's3://orc_files/dt=2017-03-21/hour=02' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> {code}
> # Loading the whole day (both hour 00 and 02):
> {code}
> x = load 's3://orc_files/dt=2017-03-21' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> 37332 [PigTezLauncher-0] INFO  
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - DAG Status: 
> status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 
> Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, 
> vertexName=scope-2, vertexId=vertex_1491991474861_0006_1_00, 
> diagnostics=[Task failed, taskId=task_1491991474861_0006_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1491991474861_0006_1_00_00_0:java.lang.IndexOutOfBoundsException: 
> Index: 4, Size: 4
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> org.apache.pig.impl.util.hive.HiveUtils.convertHiveToPig(HiveUtils.java:97)
> at org.apache.pig.builtin.OrcStorage.getNext(OrcStorage.java:381)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:119)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4872) More POPartialAgg processing and spill improvements

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4872:

Fix Version/s: (was: 0.17.0)
   0.18.0

> More POPartialAgg processing and spill improvements
> ---
>
> Key: PIG-4872
> URL: https://issues.apache.org/jira/browse/PIG-4872
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5059) Pig 0.16 e2e Types_Order tests failed with Sort check failed

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5059:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Pig 0.16 e2e Types_Order tests failed with Sort check failed
> 
>
> Key: PIG-5059
> URL: https://issues.apache.org/jira/browse/PIG-5059
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Affects Versions: 0.16.0
> Environment: Ubuntu 14.04
> Mac OS
> CentOS 7.1
>Reporter: Konstantin Harasov
> Fix For: 0.18.0
>
>
> Env: core 5.2, centOS 7.1
> pig: pig-0.16
> Pig 0.16 e2e tests Types_Order_1,2,3,4,11,12,13,14,15,16 failed because of 
> Sort check failed.
> test-base:
>  [exec] =
>  [exec] LOGGING RESULTS TO 
> /opt/pig/pig-0.16/test/e2e/pig/testdist/out/log/test_harnesss_1478952742
>  [exec] =
>  [exec] Results so far,PASSED: 0FAILED: 1SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 0FAILED: 2SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 0FAILED: 3SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 0FAILED: 4SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 1FAILED: 4SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 2FAILED: 4SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 3FAILED: 4SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 4FAILED: 4SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 5FAILED: 4SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 6FAILED: 4SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 6FAILED: 5SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 6FAILED: 6SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 6FAILED: 7SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 6FAILED: 8SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 6FAILED: 9SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Results so far,PASSED: 6FAILED: 10   SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
>  [exec] Final results ,PASSED: 6FAILED: 10   SKIPPED: 0
> ABORTED: 0FAILED DEPENDENCY: 0   
> BUILD FAILED
> TEST: Types_Order_1
> sort 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_1_benchmark.out/out_original
> test cksum: 1595601925 208685
> benchmark cksum: 1595601925 208685
> Going to run sort check command: sort -cs -t -k 1,1 -k 2n,3n 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_1.out/out_original
> /bin/sort: 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_1.out/out_original:27: 
> disorder: 18  
> Sort check failed
> TEST: Types_Order_2
> Going to run sort check command: sort -cs -t -k 1r,1r -k 2nr,3nr 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_2.out/out_original
> /bin/sort: 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_2.out/out_original:23: 
> disorder: zach young  3.34
> Sort check failed
> TEST: Types_Order_3
> Going to run sort check command: sort -cs -t -k 1,1 -k 2n,3n 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_3.out/out_original
> /bin/sort: 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_3.out/out_original:27: 
> disorder: 18  
> Sort check failed
> TEST: Types_Order_4
> Going to run sort check command: sort -cs -t -k 1r,1r -k 2nr,3nr 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_4.out/out_original
> /bin/sort: 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_4.out/out_original:23: 
> disorder: zach young  3.34
> Sort check failed
> TEST: Types_Order_11
> Going to run sort check command: sort -cs -t -k 3n 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_11.out/out_original
> /bin/sort: 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_11.out/out_original:731: 
> disorder: oscar underhill   58  0.1
> Sort check failed
> TEST: Types_Order_12
> Going to run sort check command: sort -cs -t -k 3nr 
> ./out/pigtest/-1478952742-nightly.conf/Types_Order_12.out/out_original
> /bin/sort: 
> ./out/pigtest/-

[jira] [Updated] (PIG-4910) Assert wrongly pushed up in optimizer

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4910:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Assert wrongly pushed up in optimizer
> -
>
> Key: PIG-4910
> URL: https://issues.apache.org/jira/browse/PIG-4910
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>
> The following script fail:
> {code}
> TEST_DATA = LOAD 'input' USING PigStorage() AS (c1:int);
> GR = FOREACH (GROUP TEST_DATA BY c1) GENERATE group as c1, 
> COUNT_STAR(TEST_DATA) as count, TEST_DATA;
> ROWS_WITH_C1_EQUALS_ZERO = FILTER GR BY count > 1L;
> ROWS_WITH_C1_EQUALS_ZERO_FLATTENED = FOREACH ROWS_WITH_C1_EQUALS_ZERO 
> GENERATE FLATTEN($0);
> -- Assert shouldn't fail as it should be applied after group by but because 
> assert is getting pushed to mapper, it is failing.
> ASSERT ROWS_WITH_C1_EQUALS_ZERO_FLATTENED BY c1 == 0, 'Should have never seen 
> this message, assert has a bug.';
> DUMP ROWS_WITH_C1_EQUALS_ZERO_FLATTENED;
> {code}
> input:
> 0
> 0
> 1
> The reason is assert is pushed before FILTER:
> {code}
> ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOStore Schema: c1#14:int)
> |
> |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOForEach Schema: c1#14:int)
> |   |
> |   (Name: LOGenerate[true] Schema: 
> c1#14:int)ColumnPrune:InputUids=[14]ColumnPrune:OutputUids=[14]
> |   |   |
> |   |   c1:(Name: Project Type: int Uid: 14 Input: 0 Column: (*))
> |   |
> |   |---(Name: LOInnerLoad[0] Schema: c1#14:int)
> |
> |---ROWS_WITH_C1_EQUALS_ZERO: (Name: LOFilter Schema: 
> c1#14:int,count#31:long,ROWS_WITH_C1_EQUALS_ZERO_FLATTENED#29:bag{#30:tuple(c1#14:int)})
> |   |
> |   (Name: GreaterThan Type: boolean Uid: 33)
> |   |
> |   |---count:(Name: Project Type: long Uid: 31 Input: 0 Column: 1)
> |   |
> |   |---(Name: Constant Type: long Uid: 32)
> |
> |---GR: (Name: LOForEach Schema: 
> c1#14:int,count#31:long,ROWS_WITH_C1_EQUALS_ZERO_FLATTENED#29:bag{#30:tuple(c1#14:int)})
> |   |
> |   (Name: LOGenerate[false,false,false] Schema: 
> c1#14:int,count#31:long,ROWS_WITH_C1_EQUALS_ZERO_FLATTENED#29:bag{#30:tuple(c1#14:int)})ColumnPrune:InputUids=[29,
>  14]ColumnPrune:OutputUids=[14, 31]
> |   |   |
> |   |   group:(Name: Project Type: int Uid: 14 Input: 0 Column: 
> (*))
> |   |   |
> |   |   (Name: UserFunc(org.apache.pig.builtin.COUNT_STAR) Type: 
> long Uid: 31)
> |   |   |
> |   |   |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED:(Name: Project 
> Type: bag Uid: 29 Input: 1 Column: (*))
> |   |   |
> |   |   ROWS_WITH_C1_EQUALS_ZERO_FLATTENED:(Name: Project Type: 
> bag Uid: 29 Input: 2 Column: (*))
> |   |
> |   |---(Name: LOInnerLoad[0] Schema: group#14:int)
> |   |
> |   |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOInnerLoad[1] 
> Schema: c1#14:int)
> |   |
> |   |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOInnerLoad[1] 
> Schema: c1#14:int)
> |
> |---1-3: (Name: LOCogroup Schema: 
> group#14:int,ROWS_WITH_C1_EQUALS_ZERO_FLATTENED#29:bag{#44:tuple(c1#14:int)})
> |   |
> |   c1:(Name: Project Type: int Uid: 14 Input: 0 Column: 0)
> |
> |---ROWS_WITH_C1_EQUALS_ZERO_FLATTENED: (Name: LOFilter 
> Schema: c1#14:int)
> |   |
> |   (Name: UserFunc(org.apache.pig.builtin.Assert) Type: 
> boolean Uid: 40)
> |   |
> |   |---(Name: BinCond Type: boolean Uid: 38)
> |   |   |
> |   |   |---(Name: Equal Type: boolean Uid: 35)
> |   |   |   |
> |   |   |   |---c1:(Name: Project Type: int Uid: 14 
> Input: 0 Column: 0)
> |   |   |   |
> |   |   |   |---(Name: Constant Type: int Uid: 34)
> |   |   |
> |   |   |---(Name: Constant Type: boolean Uid: 36)
> |   |   |
> |   |   |---(Name: Constant Type: boolean Uid: 37)
> |   |
> |   |---(Name: Constant Type: chararray Uid: 39)
> |
> |---TEST_DATA: (Name: LOForEach Schema: c1#14:int)
> |   |
> |   (Name: LOGenerate[false] Schema: 
> c1#14:int)ColumnPrune:InputUids=[14]ColumnPrune:OutputUids=[14]
> |   |   |
> |   |   (Name: Cast Type: int Uid: 14)
>

[jira] [Updated] (PIG-4653) Remove unwanted config set on Tez DAG, vertices and edges

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4653:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Remove unwanted config set on Tez DAG, vertices and edges
> -
>
> Key: PIG-4653
> URL: https://issues.apache.org/jira/browse/PIG-4653
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>This is bloating up the data sent in DAG submission and causes some DAGs 
> to fail with java.io.IOException: Requested data length 127200464 is longer 
> than maximum configured RPC length 67108864. 
>   It also overwhelms Tez AM and make it hit OOM while processing getTask 
> requests from 100s of tasks concurrently as the config payload in Input, 
> Output and PigProcessor cause RPC buffers to overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4120) Broadcast the index file in case of POMergeCoGroup and POMergeJoin

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4120:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Broadcast the index file in case of POMergeCoGroup and POMergeJoin
> --
>
> Key: PIG-4120
> URL: https://issues.apache.org/jira/browse/PIG-4120
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
> Currently merge join and merge cogroup use two DAGs - the first DAG creates 
> the index file in hdfs and second DAG does the merge join.  Similar to 
> replicate join, we can broadcast the index file and cache it and use it in 
> merge join and merge cogroup. This will give better performance and also 
> eliminate need for the second DAG.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4740) e2e tests for PIG-4417: repo fetching for register

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4740:

Fix Version/s: (was: 0.17.0)
   0.18.0

> e2e tests for PIG-4417: repo fetching for register
> --
>
> Key: PIG-4740
> URL: https://issues.apache.org/jira/browse/PIG-4740
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4555) Add -XX:+UseNUMA for Tez jobs

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4555:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Add -XX:+UseNUMA for Tez jobs
> -
>
> Key: PIG-4555
> URL: https://issues.apache.org/jira/browse/PIG-4555
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
> For very big Tez jobs (~50K tasks), AM quickly goes OOM without 
> -XX:+UseNUMA. tez.am.launch.cmd-opts default setting has that, but since pig 
> gives preference to yarn.app.mapreduce.am.command-opts if present (which 
> usually it is),  -XX:+UseNUMA is not there. Need to add -XX:+UseNUMA if we 
> are picking up mapreduce setting.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4400) Documentation for RollupHIIOptimizer (PIG-4066)

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4400:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Documentation for RollupHIIOptimizer (PIG-4066)
> ---
>
> Key: PIG-4400
> URL: https://issues.apache.org/jira/browse/PIG-4400
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Quang-Nhat HOANG-XUAN
>Assignee: Daniel Dai
>Priority: Critical
>  Labels: hybrid-irg, rollup
> Fix For: 0.18.0
>
> Attachments: PIG-4400-1.patch
>
>
> Adding documentation for RollupHIIOptimizer.
> Please refer to PIG-4066.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4735) PartitionerDefinedVertexManager should do slowstart

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4735:

Fix Version/s: (was: 0.17.0)
   0.18.0

> PartitionerDefinedVertexManager should do slowstart
> ---
>
> Key: PIG-4735
> URL: https://issues.apache.org/jira/browse/PIG-4735
> Project: Pig
>  Issue Type: Improvement
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>
> Currently all the partitioner vertex and final reducer vertex tasks are 
> started as soon as the sampler vertex completes. The final reducer vertex 
> tasks should only start after a percentage of partitioner vertex tasks have 
> completed. We need to do same kind of slow start as ShuffleVertexManager 
> honoring ShuffleVertexManager.TEZ_SHUFFLE_VERTEX_MANAGER_MIN_SRC_FRACTION and 
> ShuffleVertexManager.TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION 
> configurations and their defaults.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4341) Add CMX support to pig.tmpfilecompression.codec

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4341:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Add CMX support to pig.tmpfilecompression.codec
> ---
>
> Key: PIG-4341
> URL: https://issues.apache.org/jira/browse/PIG-4341
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.13.0
>Reporter: fang fang chen
>Assignee: fang fang chen
> Fix For: 0.18.0
>
> Attachments: PIG-4341.patch
>
>
> Pig has supported compression(GZ, GZIP, LZO). But latest pig has not 
> supported CMX codec yet. cmx is "com.ibm.biginsights.compress.CmxCodec". This 
> information also could be found from latest release pig-0.13.0 documentation: 
> http://pig.apache.org/docs/r0.13.0/perf.html. 
> Besides, I once tested CMX codec with pig-0.13.0 before. Following was the 
> current settings:
> SET pig.tmpfilecompression true;
> SET pig.tmpfilecompression.codec cmx;
> Error:
> Caused by: java.io.IOException: Invalid temporary file compression codec 
> [cmx]. Expected compression codecs for org.apache.pig.impl.io.TFileStorage 
> are GZ,GZIP,LZO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4249) Size estimation should be done in sampler instead of sample aggregator

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4249:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Size estimation should be done in sampler instead of sample aggregator
> --
>
> Key: PIG-4249
> URL: https://issues.apache.org/jira/browse/PIG-4249
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
> https://reviews.apache.org/r/21302/ comments on Revision 8 - size estimation 
> is done in sample aggregator for order by to keep it same as skewed join but 
> this can have performance implication if the tuple sizes are big.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4573) Set minimal configured required for Tez

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4573:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Set minimal configured required for Tez
> ---
>
> Key: PIG-4573
> URL: https://issues.apache.org/jira/browse/PIG-4573
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>   Currently all the settings from core-site, hdfs-site, mapred-site, 
> yarn-site are set on DAG, processor per vertex, on all the edges and its 
> input and output. This really bloats up and creates scaling issues with ATS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4415) Fix or comment tests with ExecType.LOCAL

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4415:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Fix or comment tests with ExecType.LOCAL
> 
>
> Key: PIG-4415
> URL: https://issues.apache.org/jira/browse/PIG-4415
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Mohit Sabharwal
> Fix For: 0.18.0
>
>
> A quick git grep of the unit tests shows several remaining occurrences of 
> ExecType.LOCAL.  These need to be fixed or otherwise comment added to 
> indicate why these should not run for Tez, Spark, etc.
> (Following list does not have piggybank tests)
>   14 TestGrunt
>7 TestFinish
>4 TestTypeCheckingValidatorNewLP
>4 TestPigScriptParser
>3 TestQueryParser
>2 TestSchemaTuple
>2 TestPredeployedJar
>2 TestPigStorage
>2 TestPigServer
>2 TestParamSubPreproc
>2 TestLogToPhyCompiler
>2 TestAvroStorage
>1 TypeCheckingTestUtil
>1 TestShortcuts
>1 TestScalarVisitor
>1 TestQueryParserUtils
>1 TestProjectStarRangeInUdf
>1 TestPlanGeneration
>1 TestPinOptions
>1 TestPi
>1 TestParser
>1 TestOrderBy2
>1 TestOptimizeLimit
>1 TestNewPlanPushUpFilter
>1 TestNewPlanPushDownForeachFlatten
>1 TestNewPlanPruneMapKeys
>1 TestNewPlanOperatorPlan
>1 TestNewPlanLogicalOptimizer
>1 TestNewPlanLogToPhyTranslationVisitor
>1 TestNewPlanFilterRule
>1 TestNewPlanFilterAboveForeach
>1 TestNewPartitionFilterPushDown
>1 TestMergeForEachOptimization
>1 TestMapProjectionDuplicate
>1 TestMRCompiler
>1 TestLogicalPlanBuilder
>1 TestLoaderStorerShipCacheFiles
>1 TestHBaseStorage
>1 TestExampleGenerator
>1 TestErrorHandling
>1 TestConstantCalculator



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4958) Tez autoparallelism estimation for order by is higher than mapreduce

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4958:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Tez autoparallelism estimation for order by is higher than mapreduce
> 
>
> Key: PIG-4958
> URL: https://issues.apache.org/jira/browse/PIG-4958
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
> Attachments: PIG-4958-1.patch, PIG-4958-2.patch, 
> PIG-4958-withoutsecurity.patch
>
>
>   The input size is calculated from the size of the samples in memory. Size 
> in memory is usually 4x or more than the serialized size. Mapreduce estimates 
> the number of reducers based on serialized size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4738) Implement StoreResources.getShipFiles() for all piggybank LoadFunc/StoreFunc/EvalFunc

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4738:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Implement StoreResources.getShipFiles() for all piggybank 
> LoadFunc/StoreFunc/EvalFunc
> -
>
> Key: PIG-4738
> URL: https://issues.apache.org/jira/browse/PIG-4738
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>  Labels: newbie
> Fix For: 0.18.0
>
>
>  piggybank.jar is always in the classpath of Pig and this will save pig users 
> from having to register piggybank.jar in their scripts when they use any 
> function from piggybank. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4749) Include a 'latest' documentation directory

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4749:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Include a 'latest' documentation directory
> --
>
> Key: PIG-4749
> URL: https://issues.apache.org/jira/browse/PIG-4749
> Project: Pig
>  Issue Type: Improvement
>Reporter: Savvas Savvides
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.18.0
>
>
> Some apache projects e.g spark include a 'latest' directory as well as a per 
> version directory under their documentation. I found that useful especially 
> when comparing the latest version of the product with some previous version. 
> It has also been my experience that it helps when accessing documentation 
> through a search engine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5030) kill command only kill application the session launches in Tez mode

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5030:

Fix Version/s: (was: 0.17.0)
   0.18.0

> kill command only kill application the session launches in Tez mode
> ---
>
> Key: PIG-5030
> URL: https://issues.apache.org/jira/browse/PIG-5030
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>
> "kill applicationId" does not work in general. Here is the existing code:
> {code}
> public void killJob(String jobID, Configuration conf) throws BackendException 
> {
> if (runningJob != null && runningJob.getApplicationId().toString() == 
> jobID) {
> try {
> runningJob.killJob();
> } catch (Exception e) {
> throw new BackendException(e);
> }
> } else {
> log.info("Cannot find job: " + jobID);
> }
> }
> {code}
> It only kill application the client launches. It is different than what we 
> have in MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4694) MultiStorageOutputFormat does not honor mapreduce.base.outputname

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4694:

Fix Version/s: (was: 0.17.0)
   0.18.0

> MultiStorageOutputFormat does not honor mapreduce.base.outputname
> -
>
> Key: PIG-4694
> URL: https://issues.apache.org/jira/browse/PIG-4694
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
> Offending piece of code.
> {code}
> Path path = new Path(fieldValue+extension, fieldValue + '-'
> + nf.format(taskId.getId())+extension);
> {code}
> Currently MultiStorage is part of pig.tez.opt.union.unsupported.storefuncs. 
> After fixing, need to be removed from there.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4912) Tez code does not differentiate between cache archives and files

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4912:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Tez code does not differentiate between cache archives and files
> 
>
> Key: PIG-4912
> URL: https://issues.apache.org/jira/browse/PIG-4912
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Artem Ervits
> Fix For: 0.18.0
>
> Attachments: PIG-4912-0.patch, PIG-4912-1.patch
>
>
> Mapreduce code handles archives but Tez code does not.
> {code}
> if (DISTRIBUTED_CACHE_ARCHIVE_MATCHER.reset(uri.toString()).find()) {
> DistributedCache.addCacheArchive(uri, conf);
> } else {
> DistributedCache.addCacheFile(uri, conf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4566) Reimplement PIG-4066: An optimization for ROLLUP operation in Pig

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4566:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Reimplement PIG-4066: An optimization for ROLLUP operation in Pig
> -
>
> Key: PIG-4566
> URL: https://issues.apache.org/jira/browse/PIG-4566
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
> Attachments: PIG-4566-1.patch
>
>
> There are some issues in the original implementation of PIG-4066. Since the 
> fix will touch most part of the patch, I'd like to rollback PIG-4066 and 
> reimplement here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4962) Estimate smaller Tez AM memory for smaller count of tasks

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4962:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Estimate smaller Tez AM memory for smaller count of tasks
> -
>
> Key: PIG-4962
> URL: https://issues.apache.org/jira/browse/PIG-4962
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
> PIG-4948 reported that having 1G heap as minimum caused problems for small 
> test clusters. We could do 128MB to 768 MB if number of tasks is between 100 
> - 1000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4914) Add testcase for join with special characters in chararray

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4914:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Add testcase for join with special characters in chararray
> --
>
> Key: PIG-4914
> URL: https://issues.apache.org/jira/browse/PIG-4914
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>   This jira is to add testcase for PIG-4821.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4922) Deadlock between SpillableMemoryManager and InternalSortedBag$SortedDataBagIterator

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4922:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Deadlock between SpillableMemoryManager and 
> InternalSortedBag$SortedDataBagIterator
> ---
>
> Key: PIG-4922
> URL: https://issues.apache.org/jira/browse/PIG-4922
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
> Attachments: PIG-4922-1.patch
>
>
>   This one ran into a deadlock, when the data was really huge and 
> InternalSortedBag was reading spilled data from disk.
> {code}
> grpd = FOREACH (GROUP data BY $0){
> sorted = ORDER data BY timestamp DESC;
> latest = LIMIT sorted 1;
> GENERATE latest;
> };
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4626) [Pig on Tez] OOM in case of multiple outputs and POPartialAgg

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4626:

Fix Version/s: (was: 0.17.0)
   0.18.0

> [Pig on Tez] OOM in case of multiple outputs and POPartialAgg
> -
>
> Key: PIG-4626
> URL: https://issues.apache.org/jira/browse/PIG-4626
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>   tez.task.scale.memory.reserve-fraction is 0.3 by default which assumes 70% 
> of memory is available for io.sort.mb. If map is configured as 1G Xmx and 
> io.sort.mb as 256MB, and there is group by on 3 different keys it has 3 
> different outputs and Tez's WeightedScalingMemoryDistributor allocates 256MB 
> to each of the outputs which leaves very less memory for Pig and POPartialAgg 
> (which requires 20% of memory) and sometime leads to OOM. This also causes 
> SpillableMemoryManager to be invoked often as its threshold is set to 70% of 
> memory.
> Need to set tez.task.scale.memory.reserve-fraction as 0.5 for all pig jobs 
> and 0.6 in case there is POPartialAgg in combiner plan.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5129) Add a global progress bar for Tez

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5129:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Add a global progress bar for Tez
> -
>
> Key: PIG-5129
> URL: https://issues.apache.org/jira/browse/PIG-5129
> Project: Pig
>  Issue Type: Improvement
>  Components: tez
>Reporter: Daniel Dai
> Fix For: 0.18.0
>
>
> In MR, we have a progress bar which tracks the percentage complete of the Pig 
> script. In Tez, we have Tez progress bar in the form of "TotalTasks: 2 
> Succeeded: 2" for a single DAG. However, we don't track number of DAGs 
> launched. An easy solution for a Tez progress bar is equally distributed 
> percentage across Dags, and within the dag, percentage of task completed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5216) Customizable Error Handling for Loaders in Pig

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5216:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Customizable Error Handling for Loaders in Pig
> --
>
> Key: PIG-5216
> URL: https://issues.apache.org/jira/browse/PIG-5216
> Project: Pig
>  Issue Type: Improvement
>Reporter: Iris Zeng
>Assignee: Iris Zeng
> Fix For: 0.18.0
>
> Attachments: PIG-5216-1.patch, PIG-5216-2.patch, PIG-5216-3.patch
>
>
> Add Error Handling for Loaders in Pig, so that user can choose to allow 
> errors when load data, and set error numbers / rate
> Ideas based on error handling on store func see 
> https://issues.apache.org/jira/browse/PIG-4704



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4373) Implement PIG-3861 in Tez

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4373:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Implement PIG-3861 in Tez
> -
>
> Key: PIG-4373
> URL: https://issues.apache.org/jira/browse/PIG-4373
> Project: Pig
>  Issue Type: Improvement
>  Components: tez
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4672) Document performance implication for Hive UDF

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4672:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Document performance implication for Hive UDF
> -
>
> Key: PIG-4672
> URL: https://issues.apache.org/jira/browse/PIG-4672
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>
> We shall document the performance using Hive UDF vs Pig native UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5191) Pig HBase 2.0.0 support

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5191:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Pig HBase 2.0.0 support
> ---
>
> Key: PIG-5191
> URL: https://issues.apache.org/jira/browse/PIG-5191
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
> Fix For: 0.18.0
>
>
> Pig doesn't support HBase 2.0.0. Since the new HBase API introduces several 
> API changes, we should find a way to support both 1.x and 2.x HBase API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4652) [Pig on Tez] Key Comparison is slower than mapreduce

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4652:

Fix Version/s: (was: 0.17.0)
   0.18.0

> [Pig on Tez] Key Comparison is slower than mapreduce
> 
>
> Key: PIG-4652
> URL: https://issues.apache.org/jira/browse/PIG-4652
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
> Tez is using PigTupleSortComparator on both map and reduce side and in 
> POShuffleTezLoad.  Mapreduce is using PigTupleWritableComparator on the map 
> and reduce side for comparing tuples which is byte only comparison and very 
> fast.  It then uses PigGroupingWritableComparator as the grouping 
> comparator to correctly group those keys. 
>   It is not possible to use similar method in Tez (PigTupleWritableComparator 
> for output and input and PigTupleSortComparator in POShuffleTezLoad), without 
> addition of APIs in Tez to get raw bytes of the keys. Because when we compare 
> multiple inputs for min key in POShuffleTezLoad, there raw bytes need to be 
> compared to maintain the same order as the mapside. In mapreduce, there was 
> only single input and mapreduce framework sorted them together. But in Tez, 
> the join inputs are sorted separately and the application only gets the 
> serialized key. Need APIs in Tez KeyValuesReader to get the bytes of the 
> current key as well which can be used in POShuffleTezLoad for min key 
> comparison.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5075) Pig ORCStorage with Snappy Compression will fail with NoClassDefFoundError org/iq80/snappy/Snappy

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5075:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Pig ORCStorage with Snappy Compression will fail with NoClassDefFoundError 
> org/iq80/snappy/Snappy
> -
>
> Key: PIG-5075
> URL: https://issues.apache.org/jira/browse/PIG-5075
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.16.0
>Reporter: Prabhu Joseph
> Fix For: 0.18.0
>
>
> Pig Script to store a text file into ORC with Snappy compression enabled 
> fails with java.lang.NoClassDefFoundError: org/iq80/snappy/Snappy
> The hive-exec jar which comes with Pig does not have snappy jars whereas the 
> hive-exec jar comes with hive has that.
> {code}
> [root@prabhuSpark3 lib]# jar tvf 
> /usr/hdp/2.4.2.0-258/pig/lib/hive-exec-1.2.1000.2.4.2.0-258-core.jar | grep 
> iq80
> [root@prabhuSpark3 lib]#
> [root@prabhuSpark3 lib]# jar tvf 
> /usr/hdp/2.4.2.0-258/hive/lib/hive-exec-1.2.1000.2.4.2.0-258.jar | grep iq80
> 0 Mon Apr 25 06:49:28 UTC 2016 org/iq80/
> 0 Mon Apr 25 06:49:28 UTC 2016 org/iq80/snappy/
> 1577 Mon Apr 25 06:49:28 UTC 2016 org/iq80/snappy/Snappy.class 
> {code}
> Repro:
> {code}
> [root@prabhuSpark3 lib]# hadoop fs -cat /tmp/data
> hadoop,5
> hive,4
> pig,3
> tez,2
> hawq,1
> MYFILE = LOAD '/tmp/data' using PigStorage(',') As (name:chararray,age:int);
> Store MYFILE into '/tmp/orcsnappydata' using OrcStorage('-c SNAPPY');
> 2016-09-22 03:29:06,830 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.Launcher - Backend error message
> Error: org/iq80/snappy/Snappy
> 2016-09-22 03:29:06,831 [main] ERROR org.apache.pig.tools.pigstats.PigStats - 
> ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: 
> Unable to recreate exception from backed error: Error: org/iq80/snappy/Snappy
> 2016-09-22 03:29:06,831 [main] ERROR 
> org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) 
> failed!
> {code}
> Workaround:
> Register /usr/hdp/2.4.2.0-258/hive/lib/hive-exec-1.2.1000.2.4.2.0-258.jar; 
> As part of this Bug, we want to include the Snappy jars which comes into Pig 
> hive-exec jar.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4959) Tez autoparallelism estimation for skewed join is higher than mapreduce

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4959:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Tez autoparallelism estimation for skewed join is higher than mapreduce
> ---
>
> Key: PIG-4959
> URL: https://issues.apache.org/jira/browse/PIG-4959
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
> Details in PIG-4958



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4548) Records Lost With Specific Combination of Commands and Streaming Function

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4548:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Records Lost With Specific Combination of Commands and Streaming Function
> -
>
> Key: PIG-4548
> URL: https://issues.apache.org/jira/browse/PIG-4548
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.14.0
> Environment: Amazon EMR (Elastic Map-Reduce) AMI 3.3.1
>Reporter: Steve T
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.18.0
>
>
> The below is the bare minimum I was able to extract from my original
> problem to in order to demonstrate the bug.  So, don't expect the following
> code to serve any practical purpose.  :)
> My input file (test_in) is two columns with a tab delimiter:
> 1   F
> 2   F
> My streaming function (sf.py) ignores the actual input and simply generates
> 2 records:
> #!/usr/bin/python
> if __name__ == '__main__':
> print 'x'
> print 'y'
> (But I should mention that in my original problem the input to output was
> one-to-one.  I just ignored the input here to get to the bare minimum
> effect.)
> My pig script:
> MY_INPUT = load 'test_in' as ( f1, f2);
> split MY_INPUT into T if (f2 == 'T'), F otherwise;
> T2 = group T by f1;
> store T2 into 'test_out/T2';
> F2 = group F by f1;
> store F2 into 'test_out/F2';  -- (this line is actually optional to demo
> the bug)
> F3 = stream F2 through `sf.py`;
> store F3 into 'test_out/F3';
> My expected output for test/out/F3 is two records that come directly from
> sf.py:
> x
> y
> However, I only get:
> x
> I've tried all of the following to get the expected behavior:
>- upgraded Pig from 0.12.0 to 0.14.0
>- local vs. distributed mode
>- flush sys.stdout in the streaming function
>- replace sf.py with sf.sh which is a bash script that used "echo x;
>echo y" to do the same thing.  In this case, the final contents of
>test_out/F# would vary - sometimes I would get both x and y, and sometimes
>I would just get x.
> Aside from removing the one Pig line that I've marked optional, any other
> attempts to simplify the Pig script or input file causes the bug to not
> manifest.
> Log files can be found at 
> http://www.mail-archive.com/user@pig.apache.org/msg10195.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4647) OrcStorage should refer to shaded kryo

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4647:

Fix Version/s: (was: 0.17.0)
   0.18.0

> OrcStorage should refer to shaded kryo
> --
>
> Key: PIG-4647
> URL: https://issues.apache.org/jira/browse/PIG-4647
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>  Hive has shaded kryo.jar as org/apache/hive/com/esotericsoftware/kryo... . 
> We should refer to that in OrcStorage so that -useHCatalog works for 
> OrcStorage instead of additional step of making user register kryo.jar 
> separately and also pass it via -cp for OrcStorage to work. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4764) Make Pig work with Hive 2.0

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4764:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Make Pig work with Hive 2.0
> ---
>
> Key: PIG-4764
> URL: https://issues.apache.org/jira/browse/PIG-4764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
> Attachments: PIG-4764-0.patch, PIG-4764-1.patch, PIG-4764-2.patch, 
> PIG-4764-3.patch, PIG-4764-4.patch
>
>
> There are a lot of changes especially around ORC in Hive 2.0. We need to make 
> Pig work with it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4130) Store/Load the same file fails for AvroStorage/OrcStorage, etc

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4130:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Store/Load the same file fails for AvroStorage/OrcStorage, etc
> --
>
> Key: PIG-4130
> URL: https://issues.apache.org/jira/browse/PIG-4130
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.18.0
>
>
> The following script fail:
> {code}
> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name:chararray, 
> age:int, gpa:float);
> store a into 'Avro.intermediate' using OrcStorage();
> b = load 'Avro.intermediate' using OrcStorage();
> c = filter b by age < 30;
> store c into 'ooo';
> {code}
> Message:
>  Invalid field projection. Projected 
> field \[age\] does not exist.
> If put a "exec" after the first store, the script success.
> Pig does compile the script into two MR job, and correctly figure out the 
> dependency of the two, but it still need to goes for "Avro.intermediate" for 
> the schema of b when compiling, and at this time "Avro.intermediate" does not 
> exist. This also happens to other Loaders which need to get the schema from 
> input file, such as OrcStorage, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4767) Partition filter not pushed down when filter clause references variable from another load path

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4767:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Partition filter not pushed down when filter clause references variable from 
> another load path
> --
>
> Key: PIG-4767
> URL: https://issues.apache.org/jira/browse/PIG-4767
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Anthony Hsu
>Assignee: Koji Noguchi
> Fix For: 0.18.0
>
>
> To reproduce:
> {noformat:title=test.pig}
> a = load 'a.txt';
> a_group = group a all;
> a_count = foreach a_group generate COUNT(a) as count;
> b = load 'mytable' using org.apache.hcatalog.pig.HCatLoader();
> b = filter b by datepartition == '2015-09-01-00' and foo == a_count.count;
> dump b;
> {noformat}
> The above query ends up reading all the table partitions. If you remove the 
> {{foo == a_count.count}} clause or replace {{a_count.count}} with a constant, 
> then partition filtering happens properly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4785) Optimize multi-query plan for diamond shape edges

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4785:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Optimize multi-query plan for diamond shape edges
> -
>
> Key: PIG-4785
> URL: https://issues.apache.org/jira/browse/PIG-4785
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>   If diamond shaped edge (two edges going to same vertex), we do not merge 
> into Split. Lot of data is transferred because of that. It can be optimized 
> to merge the operator into the Split, but still have a 
> POValueInputTez->POValueOutputTez vertex which just will be used to redirect 
> the input to avoid the diamond shaped edge.  This will allow filtering and 
> other processing to happen in the Split operator itself and the data 
> transferred to the routing vertex will be minimal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4428) Support UDFContext style getProperties() for different UDFs in Tez ObjectCache

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4428:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Support UDFContext style getProperties() for different UDFs in Tez ObjectCache
> --
>
> Key: PIG-4428
> URL: https://issues.apache.org/jira/browse/PIG-4428
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>  Maintain another level of map in the ObjectRegistry and return that when 
> user specifies the UDF class and signature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4617) XML loader is not working fine with pig 0.14 version

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4617:

Fix Version/s: (was: 0.17.0)
   0.18.0

> XML loader is not working fine with pig 0.14 version
> 
>
> Key: PIG-4617
> URL: https://issues.apache.org/jira/browse/PIG-4617
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank, UI
>Reporter: vijayalakshmi karasani
>Assignee: Rohini Palaniswamy
>Priority: Blocker
> Fix For: 0.18.0
>
>
> My old pig script (to load xml files and to parse)which ran successfully 
> through pig 0.13 version is not running with pig 0.14 and throwing 
> ava.lang.IndexOutOfBoundsException: start 4, end 2, s.length() 2. 
> Out of my 10 xml files, 2 are running fine and rest 8 are not file..All these 
> xml files ran successfully with pig 0.13 version. May be in new version, you 
> have added more validations for well formed of xml files
> My Code:
> REGISTER '/usr/hdp/current/pig-client/lib/piggybank.jar';
> C =  LOAD '/common/data/dia/stepxml/*' using 
> org.apache.pig.piggybank.storage.XMLLoader('Product') as (x:char array);
> STORE C into '/common/data/dia/intermediate_xmls/Imn_Unique_both2';
> ERROR:
> 2015-06-30 13:12:28,409 FATAL [IPC Server handler 3 on 34318] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1434729076270_34899_m_15_0 - exited : 
> java.lang.IndexOutOfBoundsException: start 4, end 2, s.length() 2
>   at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:476)
>   at java.lang.StringBuffer.append(StringBuffer.java:309)
> Input(s):
> Failed to read data from "/common/data/dia/stepxml/*"
> Output(s):
> Failed to produce result in 
> "/common/data/dia/intermediate_xmls/Imn_Unique_both2"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4662) New optimizer rule: filter nulls before inner joins

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4662:

Fix Version/s: (was: 0.17.0)
   0.18.0

> New optimizer rule: filter nulls before inner joins
> ---
>
> Key: PIG-4662
> URL: https://issues.apache.org/jira/browse/PIG-4662
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ido Hadanny
>Assignee: Satish Subhashrao Saley
>Priority: Minor
>  Labels: Performance
> Fix For: 0.18.0
>
>
> As stated in the docs, rewriting an inner join and filtering nulls from 
> inputs can be a big performance gain: 
> http://pig.apache.org/docs/r0.14.0/perf.html#nulls
> We would like to add an optimizer rule which detects inner joins, and filters 
> nulls in all inputs:
> A = filter A by t is not null;
> B = filter B by x is not null;
> C = join A by t, B by x;
> see also: 
> http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4567:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Allow UDFs to specify a counter increment other than default of 1
> -
>
> Key: PIG-4567
> URL: https://issues.apache.org/jira/browse/PIG-4567
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
> Fix For: 0.18.0
>
>
> Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method 
> to report counters which increments by 1. 
> {code}
> public final void warn(String msg, Enum warningEnum)
> {code}
> It would be more flexible to have an additional method that takes in an 
> argument to increment the counter by.
> {code}
> public final void warn(String msg, Enum warningEnum, long incr)
> {code}
> This will be useful when you might have, for instance, several fields within 
> the same row that are bad and you want the counter to reflect that. Making 
> repetitive "warn" calls is not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4579) casting of primitive datetime data type should work

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4579:

Fix Version/s: (was: 0.17.0)
   0.18.0

> casting of primitive datetime data type should work
> ---
>
> Key: PIG-4579
> URL: https://issues.apache.org/jira/browse/PIG-4579
> Project: Pig
>  Issue Type: Improvement
>Reporter: Michael Howard
>Priority: Minor
> Fix For: 0.18.0
>
>
> datetime is a primitive data type. 
> One should be able to cast a chararray or a long into a datetime. 
> currently, this does not work. 
> casting from a chararray should call the built-in UDF ToDateISO(chararray)
> casting from a long should call the built-in UDF ToDate(long)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4658) Reduce key comparisons in TezAccumulativeTupleBuffer

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4658:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Reduce key comparisons in TezAccumulativeTupleBuffer
> 
>
> Key: PIG-4658
> URL: https://issues.apache.org/jira/browse/PIG-4658
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>Currently Accumulator is applicable only for Group by.  
> TezAccumulativeTupleBuffer supports more than one tez inputs and the code for 
> that adds a lot of additional comparisons. We can make it support only one 
> tez input and get rid of couple of comparisons. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4932) Cache files not loaded when using 'limit' operator

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4932:

Fix Version/s: (was: 0.17.0)
   0.18.0

> Cache files not loaded when using 'limit' operator
> --
>
> Key: PIG-4932
> URL: https://issues.apache.org/jira/browse/PIG-4932
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: hemanth meka
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
>
> UDF in pig throws error when input is fed to the UDF after applying LIMIT 
> operator. The UDF is not able to find the cache file when using LIMIT
> org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught 
> error from UDF: org.test.hadoop.pig.BagProcess [Caught exception: File 
> './names_cache' does not exist]
> By removing the LIMIT and directly feeding the input to UDF it runs fine. 
> LIMIT operator seems to not load the cache files causing the issue. I was 
> able to regenerate this on two different cluster running 0.14.0 version of 
> hive



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4662) New optimizer rule: filter nulls before inner joins

2017-05-24 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023848#comment-16023848
 ] 

Rohini Palaniswamy commented on PIG-4662:
-

Have already done this in POBuildBloomRearrangeTez. You can refer that.

> New optimizer rule: filter nulls before inner joins
> ---
>
> Key: PIG-4662
> URL: https://issues.apache.org/jira/browse/PIG-4662
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ido Hadanny
>Assignee: Satish Subhashrao Saley
>Priority: Minor
>  Labels: Performance
> Fix For: 0.17.0
>
>
> As stated in the docs, rewriting an inner join and filtering nulls from 
> inputs can be a big performance gain: 
> http://pig.apache.org/docs/r0.14.0/perf.html#nulls
> We would like to add an optimizer rule which detects inner joins, and filters 
> nulls in all inputs:
> A = filter A by t is not null;
> B = filter B by x is not null;
> C = join A by t, B by x;
> see also: 
> http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PIG-4662) New optimizer rule: filter nulls before inner joins

2017-05-24 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley reassigned PIG-4662:


Assignee: Satish Subhashrao Saley

> New optimizer rule: filter nulls before inner joins
> ---
>
> Key: PIG-4662
> URL: https://issues.apache.org/jira/browse/PIG-4662
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ido Hadanny
>Assignee: Satish Subhashrao Saley
>Priority: Minor
>  Labels: Performance
> Fix For: 0.17.0
>
>
> As stated in the docs, rewriting an inner join and filtering nulls from 
> inputs can be a big performance gain: 
> http://pig.apache.org/docs/r0.14.0/perf.html#nulls
> We would like to add an optimizer rule which detects inner joins, and filters 
> nulls in all inputs:
> A = filter A by t is not null;
> B = filter B by x is not null;
> C = join A by t, B by x;
> see also: 
> http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4700) Pig should call ProcessorContext.setProgress() in TezTaskContext

2017-05-24 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-4700:
-
Attachment: PIG-4700-1.patch

> Pig should call ProcessorContext.setProgress() in TezTaskContext
> 
>
> Key: PIG-4700
> URL: https://issues.apache.org/jira/browse/PIG-4700
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
> Fix For: 0.17.0
>
> Attachments: PIG-4700-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4700) Pig should call ProcessorContext.setProgress() in TezTaskContext

2017-05-24 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley updated PIG-4700:
-
Status: Patch Available  (was: Open)

> Pig should call ProcessorContext.setProgress() in TezTaskContext
> 
>
> Key: PIG-4700
> URL: https://issues.apache.org/jira/browse/PIG-4700
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
> Fix For: 0.17.0
>
> Attachments: PIG-4700-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4745) DataBag should protect content of passed list of tuples

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4745:

Resolution: Invalid
Status: Resolved  (was: Patch Available)

> DataBag should protect content of passed list of tuples
> ---
>
> Key: PIG-4745
> URL: https://issues.apache.org/jira/browse/PIG-4745
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Mikko Kupsu
>Assignee: Mikko Kupsu
> Attachments: 20151125-PIG-4745.patch
>
>
> User can corrupt internal list of tuple when passing list of tuple to 
> BagFactory.newDefaultBag(List), since DefaultDataBag won't make copy 
> of a list, but will use it directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4745) DataBag should protect content of passed list of tuples

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4745:

Fix Version/s: (was: 0.17.0)

It is the responsibility of the user to make a copy and pass if they are making 
changes.

> DataBag should protect content of passed list of tuples
> ---
>
> Key: PIG-4745
> URL: https://issues.apache.org/jira/browse/PIG-4745
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Mikko Kupsu
>Assignee: Mikko Kupsu
> Attachments: 20151125-PIG-4745.patch
>
>
> User can corrupt internal list of tuple when passing list of tuple to 
> BagFactory.newDefaultBag(List), since DefaultDataBag won't make copy 
> of a list, but will use it directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4924) Translate failures.maxpercent MR setting to Tez

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4924:

Attachment: PIG-4924-1.patch

> Translate failures.maxpercent MR setting to Tez
> ---
>
> Key: PIG-4924
> URL: https://issues.apache.org/jira/browse/PIG-4924
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.17.0
>
> Attachments: PIG-4924-1.patch
>
>
> TEZ-3271 adds support equivalent to mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. We need to translate that per vertex.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-4924) Translate failures.maxpercent MR setting to Tez

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4924:

Status: Patch Available  (was: Open)

> Translate failures.maxpercent MR setting to Tez
> ---
>
> Key: PIG-4924
> URL: https://issues.apache.org/jira/browse/PIG-4924
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.17.0
>
> Attachments: PIG-4924-1.patch
>
>
> TEZ-3271 adds support equivalent to mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. We need to translate that per vertex.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PIG-4700) Pig should call ProcessorContext.setProgress() in TezTaskContext

2017-05-24 Thread Satish Subhashrao Saley (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Subhashrao Saley reassigned PIG-4700:


Assignee: Satish Subhashrao Saley

> Pig should call ProcessorContext.setProgress() in TezTaskContext
> 
>
> Key: PIG-4700
> URL: https://issues.apache.org/jira/browse/PIG-4700
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Satish Subhashrao Saley
> Fix For: 0.17.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5185) Job name show "DefaultJobName" when running a Python script

2017-05-24 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023821#comment-16023821
 ] 

Rohini Palaniswamy commented on PIG-5185:
-

+1

> Job name show "DefaultJobName" when running a Python script
> ---
>
> Key: PIG-5185
> URL: https://issues.apache.org/jira/browse/PIG-5185
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-5185-1.patch, PIG-5185-2.patch
>
>
> Run a python script with Pig, Hadoop WebUI show "DefaultJobName" instead of 
> script name. We shall use script name, the same semantic for regular Pig 
> script.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Preparing for Pig 0.17 release

2017-05-24 Thread Rohini Palaniswamy

Hi all,
   We are going to merge the Pig on Spark code by Friday and then branch
for Pig 0.17 release. Will be moving all jiras marked for 0.17 that are not
being worked on or not of high priority to 0.18 today. If you consider
anything important for 0.17, please raise a comment in that jira.

Regards,
Rohini

[jira] [Assigned] (PIG-4449) Optimize the case of Order by + Limit in nested foreach

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy reassigned PIG-4449:
---

 Assignee: Rohini Palaniswamy
Fix Version/s: 0.18.0

> Optimize the case of Order by + Limit in nested foreach
> ---
>
> Key: PIG-4449
> URL: https://issues.apache.org/jira/browse/PIG-4449
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
>  Labels: Performance
> Fix For: 0.18.0
>
>
> This is one of the very frequently used patterns
> {code}
> grouped_data_set = group data_set by id;
> capped_data_set = foreach grouped_data_set
> {
>   ordered = order joined_data_set by timestamp desc;
>   capped = limit ordered $num;
>  generate flatten(capped);
> };
> {code}
> But this performs very poorly when there are millions of rows for a key in 
> the groupby with lot of spills.  This can be easily optimized by pushing the 
> limit into the InternalSortedBag and maintain only $num records any time and 
> avoid memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Review Request 59530: PIG-5157 Upgrade to Spark 2.0

2017-05-24 Thread Adam Szita



> On May 24, 2017, 9:21 p.m., Rohini Palaniswamy wrote:
> > build.xml
> > Lines 251 (patched)
> > 
> >
> > Can just load one property file for spark2, leaving the spark 1.6 
> > version in libraries.properties

AFAIK property values in ant are immutable, so once we set e.g. 
spark.version=1.6.1 by loading the default property file and then we load the 
other propery file for spark2 I think it will stay unchanged and we cannot 
override it for anything else e.g. spark.version=2.1.1
Also this way it looks clearer I think.


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59530/#review175998
---


On May 24, 2017, 4:21 p.m., Nandor Kollar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59530/
> ---
> 
> (Updated May 24, 2017, 4:21 p.m.)
> 
> 
> Review request for pig, liyun zhang, Rohini Palaniswamy, and Adam Szita.
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> Upgrade to Spark 2.1 API using shims.
> 
> 
> Diffs
> -
> 
>   build.xml 4040fcec8f88d448ed7442461fbf0dea8cd1136e 
>   ivy/libraries.properties a0eb00acd2df42324540df4a9d762c64c608a6d3 
>   ivy/spark1.properties PRE-CREATION 
>   ivy/spark2.properties PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
>  PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractFlatMapFunction.java
>  PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractLimitConverter.java
>  PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractPairFlatMapFunction.java
>  PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinUtil.java
>  PRE-CREATION 
>   shims/src/spark16/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 
> PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
>  PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractFlatMapFunction.java
>  PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractLimitConverter.java
>  PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractPairFlatMapFunction.java
>  PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinUtil.java
>  PRE-CREATION 
>   shims/src/spark21/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 
> PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
>  f81341233447203abc4800cc7b22a4f419e10262 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 
> c6351e01a48f297ea2e432401ffd65c4f27f8078 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/CollectedGroupConverter.java
>  83311dfa5bb25209a5366c2db7e8d483c31d94cd 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/FRJoinConverter.java
>  382258e7ff9105aa397c5a2888df0c11e9562ec9 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java
>  b58415e7e18ca4cf1331beef06e9214600a51424 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java
>  f571b808839c2de9415a3e8e4b229a7f4b2eebd7 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LimitConverter.java
>  fe1b54c8f128661d7d19c276d3bb2de7874d3086 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeCogroupConverter.java
>  adf78ecab0da10d3b1a7fdde8af2b42dd899810f 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
>  d1c43b1e06adc4c9fe45a83b8110402e3756 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PoissonSampleConverter.java
>  e003bbd95763b2d189ff9ec540c89abe52592420 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java
>  00d29b44848546ed16dde2baa8c61b36939971b2 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinConverter.java
>  c55ba3145495a53d69db2dd56434dcc9b3bf8ed5 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java
>  baabfa090323e3bef087e259ce19df2e4c34dd63 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SparkSampleSortConverter.java
>  3166fdc31745c

[jira] [Resolved] (PIG-5204) Implement illustrate in Spark

2017-05-24 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita resolved PIG-5204.
-
Resolution: Duplicate

> Implement illustrate in Spark
> -
>
> Key: PIG-5204
> URL: https://issues.apache.org/jira/browse/PIG-5204
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: spark-branch
>Reporter: Nandor Kollar
>Priority: Minor
>
> Illustrate is not supported in Spark exec type right now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PIG-5204) Implement illustrate in Spark

2017-05-24 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita reassigned PIG-5204:
---

Assignee: (was: Adam Szita)

> Implement illustrate in Spark
> -
>
> Key: PIG-5204
> URL: https://issues.apache.org/jira/browse/PIG-5204
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: spark-branch
>Reporter: Nandor Kollar
>Priority: Minor
>
> Illustrate is not supported in Spark exec type right now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4621) Enable Illustrate in spark

2017-05-24 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023727#comment-16023727
 ] 

Adam Szita commented on PIG-4621:
-

I'd be happy to take this task, [~Pratyy], [~kellyzly] shall I reassign to 
myself?

> Enable Illustrate in spark
> --
>
> Key: PIG-4621
> URL: https://issues.apache.org/jira/browse/PIG-4621
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Prateek Vaishnav
> Fix For: spark-branch
>
>
> Current we don't support illustrate in spark mode.
> How illustrate works 
> see:http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#ILLUSTRATE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Review Request 59530: PIG-5157 Upgrade to Spark 2.0

2017-05-24 Thread Rohini Palaniswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59530/#review175998
---




build.xml
Lines 243 (patched)


Please have sparkversion as 1 and 2 and the shims directory names as well.



build.xml
Lines 251 (patched)


Can just load one property file for spark2, leaving the spark 1.6 version 
in libraries.properties


- Rohini Palaniswamy


On May 24, 2017, 4:21 p.m., Nandor Kollar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59530/
> ---
> 
> (Updated May 24, 2017, 4:21 p.m.)
> 
> 
> Review request for pig, liyun zhang, Rohini Palaniswamy, and Adam Szita.
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> Upgrade to Spark 2.1 API using shims.
> 
> 
> Diffs
> -
> 
>   build.xml 4040fcec8f88d448ed7442461fbf0dea8cd1136e 
>   ivy/libraries.properties a0eb00acd2df42324540df4a9d762c64c608a6d3 
>   ivy/spark1.properties PRE-CREATION 
>   ivy/spark2.properties PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
>  PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractFlatMapFunction.java
>  PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractLimitConverter.java
>  PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractPairFlatMapFunction.java
>  PRE-CREATION 
>   
> shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinUtil.java
>  PRE-CREATION 
>   shims/src/spark16/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 
> PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
>  PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractFlatMapFunction.java
>  PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractLimitConverter.java
>  PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractPairFlatMapFunction.java
>  PRE-CREATION 
>   
> shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinUtil.java
>  PRE-CREATION 
>   shims/src/spark21/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 
> PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
>  f81341233447203abc4800cc7b22a4f419e10262 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 
> c6351e01a48f297ea2e432401ffd65c4f27f8078 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/CollectedGroupConverter.java
>  83311dfa5bb25209a5366c2db7e8d483c31d94cd 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/FRJoinConverter.java
>  382258e7ff9105aa397c5a2888df0c11e9562ec9 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java
>  b58415e7e18ca4cf1331beef06e9214600a51424 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java
>  f571b808839c2de9415a3e8e4b229a7f4b2eebd7 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LimitConverter.java
>  fe1b54c8f128661d7d19c276d3bb2de7874d3086 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeCogroupConverter.java
>  adf78ecab0da10d3b1a7fdde8af2b42dd899810f 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
>  d1c43b1e06adc4c9fe45a83b8110402e3756 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PoissonSampleConverter.java
>  e003bbd95763b2d189ff9ec540c89abe52592420 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java
>  00d29b44848546ed16dde2baa8c61b36939971b2 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinConverter.java
>  c55ba3145495a53d69db2dd56434dcc9b3bf8ed5 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java
>  baabfa090323e3bef087e259ce19df2e4c34dd63 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SparkSampleSortConverter.java
>  3166fdc31745c013380492e089c83f3e853a3e6e 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java
>  3a50d485cfd54b9f3b9c1a982e6c30497a4c85fc 
>   src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 
> c8cc03

Re: Review Request 57317: Support Pig On Spark

2017-05-24 Thread Rohini Palaniswamy



> On March 21, 2017, 8:36 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java
> > Lines 387-388 (patched)
> > 
> >
> > Why copy to local directory if already in hdfs?
> 
> kelly zhang wrote:
> Copy cache file which is stored in hdfs to local and upload the cache 
> file to spark cluster by sparkContext#addFiles in order to let these files 
> can be downloaded by spark workers.

You should be able to specify the hdfs path directly to spark and avoid the 
unnecessary download and upload. Oozie does it for spark action. Please create 
a jira for this. Can be fixed later.


- Rohini


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57317/#review169547
---


On May 18, 2017, 8:06 a.m., kelly zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57317/
> ---
> 
> (Updated May 18, 2017, 8:06 a.m.)
> 
> 
> Review request for pig, Daniel Dai and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-4059 and PIG-4854;
> https://issues.apache.org/jira/browse/PIG-4059
> https://issues.apache.org/jira/browse/PIG-4854;
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> Merge all changes from spark branch
> 
> 
> Diffs
> -
> 
>   bin/pig e1212fa 
>   build.xml a0d2ca8 
>   ivy.xml 42daec9 
>   ivy/libraries.properties 481066e 
>   src/META-INF/services/org.apache.pig.ExecType 5c034c8 
>   src/docs/src/documentation/content/xdocs/start.xml c9a1491 
>   src/org/apache/pig/PigConfiguration.java d25f81a 
>   src/org/apache/pig/PigWarning.java fcda114 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/AccumulatorOptimizer.java
>  ac03d40 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/NoopFilterRemover.java
>  4d91556 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/NoopFilterRemoverUtil.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigHadoopLogger.java
>  255650e 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java
>  6fe8ff3 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigSplit.java
>  e866b28 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/SecondaryKeyOptimizerMR.java
>  8170f02 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
>  0e35273 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
>  ecf780c 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  3bad98b 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhysicalPlan.java
>  2376d03 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POBroadcastSpark.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCollectedGroup.java
>  bcbfe2b 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
>  d80951a 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoinSpark.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java
>  4dc6d54 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POGlobalRearrange.java
>  52cfb73 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeCogroup.java
>  4923d3f 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
>  13f70c0 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPoissonSample.java
>  f2830c2 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java
>  c3a82c3 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/JobGraphBuilder.java 
> PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
>  PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/KryoSerializer.java 
> PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/MapReducePartitionerWrapper.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/SparkEngineConf.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecType.java 
> PRE-CREATION 
>   
> src/org/apache/pig/bac

Re: Review Request 57317: Support Pig On Spark

2017-05-24 Thread Rohini Palaniswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57317/#review175872
---




src/org/apache/pig/backend/hadoop/executionengine/spark/JobGraphBuilder.java
Line 262 (original), 263 (patched)


Typo. successorPlan



src/org/apache/pig/backend/hadoop/executionengine/spark/JobGraphBuilder.java
Line 265 (original), 266 (patched)


LOG.error redundant when exception is being thrown



src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java
Lines 166-167 (original), 170 (patched)


if (LOG.isDebugEnabled())
   LOG.debug(sparkplan);

Can you also print the final spark plan (after optimizations and 
conversions) for the new setting introduced in PIG-5210.



src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java
Line 325 (original)


This seems to be removed by mistake



src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkOperator.java
Lines 272 (patched)


Formatting is off



src/org/apache/pig/backend/hadoop/executionengine/util/SecondaryKeyOptimizerUtil.java
Line 56 (original), 56 (patched)


Should still be Private



src/org/apache/pig/impl/util/UDFContext.java
Lines 210-212 (patched)


Remove this and add

/*
 *  Internal pig use 
 */



src/org/apache/pig/backend/hadoop/executionengine/spark/SparkPigContext.java
Lines 34 (patched)


defaultParallelism

Only final constant variables should be in full upper case.



src/org/apache/pig/backend/hadoop/executionengine/spark/SparkPigContext.java
Lines 51 (patched)


Shouldn't default parallelism returned if requested parallelism <= 0 ?



test/e2e/pig/tests/nightly.conf
Lines 2307 (patched)


Testing distinct + orderby + limit serves the same purpose as orderby + 
limit test. Can you remove orderby from this test? If distinct + limit differs 
everytime even with spark and a different verify_pig_script runs just ignore 
the test for now adding a TODO to test num 4.



test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java
Lines 66 (patched)


Can you add a TODO to look into the repetition of A[3,4] later?



test/org/apache/pig/test/TestCombiner.java
Lines 122-125 (original), 126-129 (patched)


Utils.checkQueryOutputsAfterSort(resultIterator, resultTuples);



test/org/apache/pig/test/TestCubeOperator.java
Lines 569 (patched)


illustrate does not work in spark



test/org/apache/pig/test/TestEmptyInputDir.java
Line 88 (original), 89 (patched)


assertEmptyOutputFile(); is missing after assumeTrue



test/org/apache/pig/test/TestGrunt.java
Lines 938 (patched)


Move this to beginning of the test



test/org/apache/pig/test/TestGrunt.java
Line 939 (original)


assertTrue(caught); is missing



test/org/apache/pig/test/TestPigRunner.java
Lines 216 (patched)


if (execType.equals("mapreduce")) {
assertEquals(2, stats.getNumberJobs());
assertEquals(stats.getJobGraph().size(), 2);
} else {
// Tez and Spark
assertEquals(1, stats.getNumberJobs());
assertEquals(stats.getJobGraph().size(), 1);
}



test/org/apache/pig/test/TestPigRunner.java
Lines 465 (patched)


Can you add a TODO here to fix this to work without -no_multiquery?



test/org/apache/pig/test/TestPigRunner.java
Line 508 (original), 548 (patched)


if(execType.equals("mapreduce") {
...
} else {
// Tez and spark
...
}



test/org/apache/pig/test/TestPigServer.java
Lines 550-552 (original), 550-553 (patched)


Assume.assumeTrue("Skip this test for TEZ", 
Util.isMapredExecType(cluster.getExecType()) || 
Util.isSparkExecType(cluster.getExecType()));



test/org/apache/pig/test/TestPigSe

Review Request 59530: PIG-5157 Upgrade to Spark 2.0

2017-05-24 Thread Nandor Kollar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59530/
---

Review request for pig, liyun zhang, Rohini Palaniswamy, and Adam Szita.


Repository: pig-git


Description
---

Upgrade to Spark 2.1 API using shims.


Diffs
-

  build.xml 4040fcec8f88d448ed7442461fbf0dea8cd1136e 
  ivy/libraries.properties a0eb00acd2df42324540df4a9d762c64c608a6d3 
  ivy/spark1.properties PRE-CREATION 
  ivy/spark2.properties PRE-CREATION 
  
shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
 PRE-CREATION 
  
shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractFlatMapFunction.java
 PRE-CREATION 
  
shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractLimitConverter.java
 PRE-CREATION 
  
shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractPairFlatMapFunction.java
 PRE-CREATION 
  
shims/src/spark16/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinUtil.java
 PRE-CREATION 
  shims/src/spark16/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 
PRE-CREATION 
  
shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
 PRE-CREATION 
  
shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractFlatMapFunction.java
 PRE-CREATION 
  
shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractLimitConverter.java
 PRE-CREATION 
  
shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/AbstractPairFlatMapFunction.java
 PRE-CREATION 
  
shims/src/spark21/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinUtil.java
 PRE-CREATION 
  shims/src/spark21/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 
PRE-CREATION 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java 
f81341233447203abc4800cc7b22a4f419e10262 
  src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 
c6351e01a48f297ea2e432401ffd65c4f27f8078 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/CollectedGroupConverter.java
 83311dfa5bb25209a5366c2db7e8d483c31d94cd 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/FRJoinConverter.java
 382258e7ff9105aa397c5a2888df0c11e9562ec9 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java
 b58415e7e18ca4cf1331beef06e9214600a51424 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java
 f571b808839c2de9415a3e8e4b229a7f4b2eebd7 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LimitConverter.java
 fe1b54c8f128661d7d19c276d3bb2de7874d3086 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeCogroupConverter.java
 adf78ecab0da10d3b1a7fdde8af2b42dd899810f 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
 d1c43b1e06adc4c9fe45a83b8110402e3756 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PoissonSampleConverter.java
 e003bbd95763b2d189ff9ec540c89abe52592420 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java
 00d29b44848546ed16dde2baa8c61b36939971b2 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinConverter.java
 c55ba3145495a53d69db2dd56434dcc9b3bf8ed5 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java
 baabfa090323e3bef087e259ce19df2e4c34dd63 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SparkSampleSortConverter.java
 3166fdc31745c013380492e089c83f3e853a3e6e 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java
 3a50d485cfd54b9f3b9c1a982e6c30497a4c85fc 
  src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java 
c8cc03194b223d2ee181d73c6b651a6872cac6b6 


Diff: https://reviews.apache.org/r/59530/diff/1/


Testing
---


Thanks,

Nandor Kollar

[jira] [Resolved] (PIG-2648) MapReduceLauncher squashes unchecked exceptions

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-2648.
-
Resolution: Duplicate

> MapReduceLauncher squashes unchecked exceptions
> ---
>
> Key: PIG-2648
> URL: https://issues.apache.org/jira/browse/PIG-2648
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Alex Levenson
>
> in:
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.JobControlThreadExceptionHandler
> {code}
> jobControlExceptionStackTrace = getStackStraceStr(throwable);
> try { 
> jobControlException = 
> getExceptionFromString(jobControlExceptionStackTrace);
> } catch (Exception e) {
> String errMsg = "Could not resolve error that occured when launching map 
> reduce job: "
> + jobControlExceptionStackTrace;
> jobControlException = new RuntimeException(errMsg);
> }
> {code}
> The catch clause does not chain the original exception, this made tracking 
> down: https://issues.apache.org/jira/browse/PIG-2645 a lot more difficult.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5099) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5099:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5099
> URL: https://issues.apache.org/jira/browse/PIG-5099
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5090) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5090:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5090
> URL: https://issues.apache.org/jira/browse/PIG-5090
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> unparsable?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5095) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5095:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5095
> URL: https://issues.apache.org/jira/browse/PIG-5095
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5108) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023133#comment-16023133
 ] 

Rohini Palaniswamy commented on PIG-5108:
-

Deleted PIG-5090 to PIG-5099 which are duplicates of this jira created during a 
jira outage.

> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5108
> URL: https://issues.apache.org/jira/browse/PIG-5108
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.16.0
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>Assignee: Daniel Dai
> Fix For: 0.17.0, 0.16.1
>
> Attachments: person-prop.avro, PIG-5108-1.patch, 
> PIG-5108-2-addendum.patch
>
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5092) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5092:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5092
> URL: https://issues.apache.org/jira/browse/PIG-5092
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5093) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5093:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5093
> URL: https://issues.apache.org/jira/browse/PIG-5093
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5094) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5094:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5094
> URL: https://issues.apache.org/jira/browse/PIG-5094
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5097) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5097:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5097
> URL: https://issues.apache.org/jira/browse/PIG-5097
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5091) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5091:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5091
> URL: https://issues.apache.org/jira/browse/PIG-5091
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> unparsable?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5098) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5098:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5098
> URL: https://issues.apache.org/jira/browse/PIG-5098
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Deleted] (PIG-5096) AvroStorage on Tez with exception on nested records

2017-05-24 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy deleted PIG-5096:



> AvroStorage on Tez with exception on nested records
> ---
>
> Key: PIG-5096
> URL: https://issues.apache.org/jira/browse/PIG-5096
> Project: Pig
>  Issue Type: Bug
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
>Reporter: Sebastian Geller
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when 
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
> implemented yet
>   at 
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
>   at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is 
> there any underlying change when migrating to Tez which makes the schema 
> invalid?
> Thanks,
> Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PIG-4266) Umbrella jira for unit tests for Spark

2017-05-24 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar reassigned PIG-4266:
--

Assignee: (was: liyunzhang_intel)

> Umbrella jira for unit tests for Spark
> --
>
> Key: PIG-4266
> URL: https://issues.apache.org/jira/browse/PIG-4266
> Project: Pig
>  Issue Type: Task
>  Components: spark
>Reporter: Praveen Rachabattuni
> Fix For: spark-branch
>
> Attachments: spark-tests
>
>
> Get all unit tests running in spark mode.
> Single unit test can be run as:
> ant -Dtestcase=TestAlgebraicEval test-spark



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PIG-4266) Umbrella jira for unit tests for Spark

2017-05-24 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar resolved PIG-4266.

Resolution: Fixed

> Umbrella jira for unit tests for Spark
> --
>
> Key: PIG-4266
> URL: https://issues.apache.org/jira/browse/PIG-4266
> Project: Pig
>  Issue Type: Task
>  Components: spark
>Reporter: Praveen Rachabattuni
> Fix For: spark-branch
>
> Attachments: spark-tests
>
>
> Get all unit tests running in spark mode.
> Single unit test can be run as:
> ant -Dtestcase=TestAlgebraicEval test-spark



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4266) Umbrella jira for unit tests for Spark

2017-05-24 Thread Nandor Kollar (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023032#comment-16023032
 ] 

Nandor Kollar commented on PIG-4266:


Closing this Jira since all unit tests are passing on spark branch.

> Umbrella jira for unit tests for Spark
> --
>
> Key: PIG-4266
> URL: https://issues.apache.org/jira/browse/PIG-4266
> Project: Pig
>  Issue Type: Task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: spark-tests
>
>
> Get all unit tests running in spark mode.
> Single unit test can be run as:
> ant -Dtestcase=TestAlgebraicEval test-spark



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PIG-4295) Enable unit test "TestPigContext" for spark

2017-05-24 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar resolved PIG-4295.

Resolution: Fixed
  Assignee: (was: liyunzhang_intel)

> Enable unit test "TestPigContext" for spark
> ---
>
> Key: PIG-4295
> URL: https://issues.apache.org/jira/browse/PIG-4295
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Affects Versions: spark-branch
>Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4295_1.patch, PIG-4295_2.patch, PIG-4295_3.patch, 
> PIG-4295_4.patch, PIG-4295_5.patch, PIG-4295.patch, 
> TEST-org.apache.pig.test.TestPigContext.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PIG-4292) Enable unit test "TestMergeJoinOuter" for spark

2017-05-24 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar resolved PIG-4292.

Resolution: Fixed

> Enable unit test "TestMergeJoinOuter" for spark
> ---
>
> Key: PIG-4292
> URL: https://issues.apache.org/jira/browse/PIG-4292
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: TEST-org.apache.pig.test.TestMergeJoinOuter.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PIG-4277) Enable unit test "TestEvalPipeline2" for spark

2017-05-24 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar reassigned PIG-4277:
--

Assignee: (was: Mohit Sabharwal)

> Enable unit test "TestEvalPipeline2" for spark
> --
>
> Key: PIG-4277
> URL: https://issues.apache.org/jira/browse/PIG-4277
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: TEST-org.apache.pig.test.TestEvalPipeline2.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 >

1 - 100 of 123 matches

Mail list logo