[jira] [Updated] (PIG-4196) Auto ship udf jar is broken

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4196:

Attachment: PIG-4196-2.patch

Thanks, you are absolutely right! Revised patch.

> Auto ship udf jar is broken
> ---
>
> Key: PIG-4196
> URL: https://issues.apache.org/jira/browse/PIG-4196
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4196-0.patch, PIG-4196-1.patch, PIG-4196-2.patch
>
>
> The mechanism to ship udf containing jar is broken in PIG-4054. Attach a 
> quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4215) Fix unit test failure TestParamSubPreproc and TestMacroExpansion

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4215:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.14 branch. Thanks Rohini for review!

> Fix unit test failure TestParamSubPreproc and TestMacroExpansion
> 
>
> Key: PIG-4215
> URL: https://issues.apache.org/jira/browse/PIG-4215
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4215-1.patch
>
>
> Unit tests are broken by PIG-4080. We shall fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4175) PIG CROSS operation follow by STORE produces non-deterministic results each run

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4175:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.14 branch. Thanks Rohini for review!

> PIG CROSS operation follow by STORE produces non-deterministic results each 
> run
> ---
>
> Key: PIG-4175
> URL: https://issues.apache.org/jira/browse/PIG-4175
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12.0
> Environment: RHEL 6/64-bit
>Reporter: Jim Huang
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4175-1.patch, mktestdata.py, pig_testcross_plan.png, 
> test_cross.out, test_cross.pig
>
>
> Three files will be attached to help visualize this issue.
> 1. mktestdata.py - to generate test data to feed the pig script
> 2. test_cross.pig - the PIG script using CROSS and STORE
> 3. test_cross.out - the PIG console output showing the input/output records 
> delta
> To reproduce this PIG CROSS operation problem, you need to use the supplied 
> Python script,
> mktestdata.py, to generate an input file that is at least 13,948,228,930 
> bytes (> 13GB).
> The CROSS between raw_data (m records) and cross_count (1 record) should 
> yield exactly (m records) as the output.  
> The STORE results from the CROSS operations yielded about 1/3 of input record 
> in raw_data as the output.  
> If I joined the both of the CROSS operations together, the STORE results from 
> the CROSS operations yielded about 2/3
> of the input records in raw-data as the output.  
> -- data = CROSS raw_data, field04s_count, subsection1_field04s_count, 
> subsection2_field04s_count;
> We have reproduced this using both Pig 0.11 (Hadoop 1.x) and Pig 0.12 (Hadoop 
> 2.x) clusters.  
> The default HDFS block size is 128MB.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4214) Fix unit test fail TestMRJobStats

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4214:

Attachment: PIG-4214-2.patch

Good catch! Attach revised patch.

> Fix unit test fail TestMRJobStats
> -
>
> Key: PIG-4214
> URL: https://issues.apache.org/jira/browse/PIG-4214
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4214-1.patch, PIG-4214-2.patch
>
>
> TestMRJobStats is broken by PIG-4050. We shall fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4175) PIG CROSS operation follow by STORE produces non-deterministic results each run

2014-09-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154285#comment-14154285
 ] 

Rohini Palaniswamy commented on PIG-4175:
-

+1

> PIG CROSS operation follow by STORE produces non-deterministic results each 
> run
> ---
>
> Key: PIG-4175
> URL: https://issues.apache.org/jira/browse/PIG-4175
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12.0
> Environment: RHEL 6/64-bit
>Reporter: Jim Huang
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4175-1.patch, mktestdata.py, pig_testcross_plan.png, 
> test_cross.out, test_cross.pig
>
>
> Three files will be attached to help visualize this issue.
> 1. mktestdata.py - to generate test data to feed the pig script
> 2. test_cross.pig - the PIG script using CROSS and STORE
> 3. test_cross.out - the PIG console output showing the input/output records 
> delta
> To reproduce this PIG CROSS operation problem, you need to use the supplied 
> Python script,
> mktestdata.py, to generate an input file that is at least 13,948,228,930 
> bytes (> 13GB).
> The CROSS between raw_data (m records) and cross_count (1 record) should 
> yield exactly (m records) as the output.  
> The STORE results from the CROSS operations yielded about 1/3 of input record 
> in raw_data as the output.  
> If I joined the both of the CROSS operations together, the STORE results from 
> the CROSS operations yielded about 2/3
> of the input records in raw-data as the output.  
> -- data = CROSS raw_data, field04s_count, subsection1_field04s_count, 
> subsection2_field04s_count;
> We have reproduced this using both Pig 0.11 (Hadoop 1.x) and Pig 0.12 (Hadoop 
> 2.x) clusters.  
> The default HDFS block size is 128MB.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4212) Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)

2014-09-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154269#comment-14154269
 ] 

Rohini Palaniswamy commented on PIG-4212:
-

Can you just add a newline before @Test before checking in the patch?

> Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)
> --
>
> Key: PIG-4212
> URL: https://issues.apache.org/jira/browse/PIG-4212
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-4212-v1.patch
>
>
> Somehow 
> limit A 0 
> is currently allowed but not
>limit A B.count - B.count 
> I'd like the latter to be also allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4196) Auto ship udf jar is broken

2014-09-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154261#comment-14154261
 ] 

Rohini Palaniswamy commented on PIG-4196:
-

Shouldn't it be

{code}
String jar = JarManager.findContainingJar(clazz);
if (jar!=null) {
   jar = new File(jar).toURI().toURL();
   if(!allJars.contains(jar)) {
 allJars.add();
  }
}
{code}

> Auto ship udf jar is broken
> ---
>
> Key: PIG-4196
> URL: https://issues.apache.org/jira/browse/PIG-4196
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4196-0.patch, PIG-4196-1.patch
>
>
> The mechanism to ship udf containing jar is broken in PIG-4054. Attach a 
> quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4214) Fix unit test fail TestMRJobStats

2014-09-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154256#comment-14154256
 ] 

Rohini Palaniswamy commented on PIG-4214:
-

Can do java.util.Arrays.asList(mapTaskReports).iterator() instead of using 
org.apache.commons.collections4.iterators.ArrayIterator.  commons-collection4 
dependency is only there with tez and will not be there for hadoop 1.x. 

> Fix unit test fail TestMRJobStats
> -
>
> Key: PIG-4214
> URL: https://issues.apache.org/jira/browse/PIG-4214
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4214-1.patch
>
>
> TestMRJobStats is broken by PIG-4050. We shall fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4216) Making use of Centralized Cache Management in HDFS

2014-09-30 Thread srinivas (JIRA)
srinivas created PIG-4216:
-

 Summary: Making use of  Centralized Cache Management in HDFS
 Key: PIG-4216
 URL: https://issues.apache.org/jira/browse/PIG-4216
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.12.0
Reporter: srinivas


I am working on optimizing joins , came across new feature in HDFS "Centralized 
Cache Management in HDFS" , I tried to cache a dataset in hdfs and ran pig 
script join, but I don't see any improvement in performance, I am not sure if 
this feature is abstracted from pig and map reduce takes care of it or pig 
needs some modifications.


http://www.cloudera.com/content/cloudera/en/documentation/cdh5/latest/CDH5-Installation-Guide/cdh5ig_hdfs_caching.html





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4215) Fix unit test failure TestParamSubPreproc and TestMacroExpansion

2014-09-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154040#comment-14154040
 ] 

Rohini Palaniswamy commented on PIG-4215:
-

+1

> Fix unit test failure TestParamSubPreproc and TestMacroExpansion
> 
>
> Key: PIG-4215
> URL: https://issues.apache.org/jira/browse/PIG-4215
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4215-1.patch
>
>
> Unit tests are broken by PIG-4080. We shall fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Pig on Spark - Suggestions in handling code changes out of Spark

2014-09-30 Thread Rohini Palaniswamy
Do you want to submit all the changes in one single separate Jira? Will be
easier for us to review.

On Tue, Sep 30, 2014 at 7:01 AM, Praveen R 
wrote:

> *Hi Everyone,*
>
> Earlier we have made some changes on
> https://github.com/sigmoidanalytics/spork/tree/spork-pig-12 to achieve
> complete e2e coverage but we couldn't restrict ourselves in making changes
> in pig codebase as we found it slightly easier to do.
>
> We are now working on merging these changes to
> https://github.com/apache/pig/tree/spark and had to re-look into these
> changes, either find a workaround or propose the change on trunk.
>
> Below is the gist of code changes that are made out of Spark for which the
> related code can be found here 
>
>
>1.
>
>Had to comment out PigStatsUtil.addNativeJobStats(PigStats.get(), this,
>true); to get native (mapred) operator working
>2.
>
>Changes in PigRecordReader to identify endOfAllInput
>3.
>
>POUserFunc - made properties attribute public
>4.
>
>POCollectedGroup - getNextTuple modified to identify the end of all
> input
>5.
>
>POFRJoin - made LRs attribute public to use it during FR join
>6.
>
>POMergeJoin - made LRs attribute public to use it during merge join
>7.
>
>POStream - problem with identifying endOfAllInput, made some changes
>8.
>
>JsonLoader - made properties public to use from JsonStorage
>9.
>
>JsonStorage - uses properties from JsonLoader
>10.
>
>PigStorage - mRequiredColumns attribute
>11.
>
>BinSedesTuple, BinSedesTupleFactory - made the class serializable
>12.
>
>SchemaTupleBackend - changes to initialize stbInstance when null
>
>
>
> Would like to seek upfront suggestions before I submit the related patches
> and take the discussion on a issue basis.
>
> BW, below are the jira issues relating above changes which I would be
> working on. Please feel free to comment on the issue whoever is interested
> in taking them up.
>
> PIG-4193, PIG-4189, PIG-4190, PIG-4192, PIG-4200, PIG-4207, PIG-4208,
> PIG-4209
>
> Thanks,
> Praveen R
>


[jira] [Updated] (PIG-4196) Auto ship udf jar is broken

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4196:

Status: Patch Available  (was: Open)

> Auto ship udf jar is broken
> ---
>
> Key: PIG-4196
> URL: https://issues.apache.org/jira/browse/PIG-4196
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4196-0.patch, PIG-4196-1.patch
>
>
> The mechanism to ship udf containing jar is broken in PIG-4054. Attach a 
> quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4213) CSVExcelStorage not quoting texts containing \r (CR) when storing

2014-09-30 Thread Alfonso Nishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153753#comment-14153753
 ] 

Alfonso Nishikawa commented on PIG-4213:


[~daijy], Sure! Working on it  (on a suitable computer) :) Many thanks!

> CSVExcelStorage not quoting texts containing \r (CR) when storing
> -
>
> Key: PIG-4213
> URL: https://issues.apache.org/jira/browse/PIG-4213
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.12.0
>Reporter: Alfonso Nishikawa
>Assignee: Alfonso Nishikawa
>Priority: Trivial
> Attachments: PIG-4213v1.patch
>
>
> Managing tweets information I found that someone wrote a multiline tweet in 
> Mac OS 9 (or bellow). When exporting the text, it is not being quoted so 
> LibreOffice can't import the cell properly (don't try Excel 2007 because it's 
> bugged).
> I suggest including the CR case in the same way as commented in 
> http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4213) CSVExcelStorage not quoting texts containing \r (CR) when storing

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4213:

Assignee: Alfonso Nishikawa

> CSVExcelStorage not quoting texts containing \r (CR) when storing
> -
>
> Key: PIG-4213
> URL: https://issues.apache.org/jira/browse/PIG-4213
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.12.0
>Reporter: Alfonso Nishikawa
>Assignee: Alfonso Nishikawa
>Priority: Trivial
> Attachments: PIG-4213v1.patch
>
>
> Managing tweets information I found that someone wrote a multiline tweet in 
> Mac OS 9 (or bellow). When exporting the text, it is not being quoted so 
> LibreOffice can't import the cell properly (don't try Excel 2007 because it's 
> bugged).
> I suggest including the CR case in the same way as commented in 
> http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4213) CSVExcelStorage not quoting texts containing \r (CR) when storing

2014-09-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153714#comment-14153714
 ] 

Daniel Dai commented on PIG-4213:
-

Thanks Alfonso, but the patch break 3 tests in TestCSVExcelStorage, can you 
take a look?

> CSVExcelStorage not quoting texts containing \r (CR) when storing
> -
>
> Key: PIG-4213
> URL: https://issues.apache.org/jira/browse/PIG-4213
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.12.0
>Reporter: Alfonso Nishikawa
>Priority: Trivial
> Attachments: PIG-4213v1.patch
>
>
> Managing tweets information I found that someone wrote a multiline tweet in 
> Mac OS 9 (or bellow). When exporting the text, it is not being quoted so 
> LibreOffice can't import the cell properly (don't try Excel 2007 because it's 
> bugged).
> I suggest including the CR case in the same way as commented in 
> http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4215) Fix unit test failure TestParamSubPreproc and TestMacroExpansion

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4215:

Status: Patch Available  (was: Open)

> Fix unit test failure TestParamSubPreproc and TestMacroExpansion
> 
>
> Key: PIG-4215
> URL: https://issues.apache.org/jira/browse/PIG-4215
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4215-1.patch
>
>
> Unit tests are broken by PIG-4080. We shall fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4215) Fix unit test failure TestParamSubPreproc and TestMacroExpansion

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4215:

Attachment: PIG-4215-1.patch

> Fix unit test failure TestParamSubPreproc and TestMacroExpansion
> 
>
> Key: PIG-4215
> URL: https://issues.apache.org/jira/browse/PIG-4215
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4215-1.patch
>
>
> Unit tests are broken by PIG-4080. We shall fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4215) Fix unit test failure TestParamSubPreproc and TestMacroExpansion

2014-09-30 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4215:
---

 Summary: Fix unit test failure TestParamSubPreproc and 
TestMacroExpansion
 Key: PIG-4215
 URL: https://issues.apache.org/jira/browse/PIG-4215
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0


Unit tests are broken by PIG-4080. We shall fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153705#comment-14153705
 ] 

Richard Ding commented on PIG-4173:
---

Sorry, I meant PIG-4168.

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
> TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153701#comment-14153701
 ] 

Richard Ding commented on PIG-4173:
---

Hi ~praveenr019, 

Since PIG-4186 hasn't been checked in, it seems make more sense to first build 
with Spark 1.x and then fix PIG-4186. What do you think?

Thanks,
-Richard

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
> TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4173_3.patch

Thanks for the review. The new patch incorporate the changes in the comments. 

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
> TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4214) Fix unit test fail TestMRJobStats

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4214:

Status: Patch Available  (was: Open)

> Fix unit test fail TestMRJobStats
> -
>
> Key: PIG-4214
> URL: https://issues.apache.org/jira/browse/PIG-4214
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4214-1.patch
>
>
> TestMRJobStats is broken by PIG-4050. We shall fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4214) Fix unit test fail TestMRJobStats

2014-09-30 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4214:
---

 Summary: Fix unit test fail TestMRJobStats
 Key: PIG-4214
 URL: https://issues.apache.org/jira/browse/PIG-4214
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0
 Attachments: PIG-4214-1.patch

TestMRJobStats is broken by PIG-4050. We shall fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4214) Fix unit test fail TestMRJobStats

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4214:

Attachment: PIG-4214-1.patch

> Fix unit test fail TestMRJobStats
> -
>
> Key: PIG-4214
> URL: https://issues.apache.org/jira/browse/PIG-4214
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4214-1.patch
>
>
> TestMRJobStats is broken by PIG-4050. We shall fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4213) CSVExcelStorage not quoting texts containing \r (CR) when storing

2014-09-30 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated PIG-4213:
---
Description: 
Managing tweets information I found that someone wrote a multiline tweet in Mac 
OS 9 (or bellow). When exporting the text, it is not being quoted so 
LibreOffice can't import the cell properly (don't try Excel 2007 because it's 
bugged).

I suggest including the CR case in the same way as commented in 
http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315

  was:
Managing tweets information I found that someone wrote a multiline tweet in Mac 
OS 9 (or bellow). When exporting the text is not bein quoted so LibreOffice 
can't import the cell properly (don't try Excel 2007 because it's bugged).

I suggest including the CR case in the same way as commented in 
http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315


> CSVExcelStorage not quoting texts containing \r (CR) when storing
> -
>
> Key: PIG-4213
> URL: https://issues.apache.org/jira/browse/PIG-4213
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.12.0
>Reporter: Alfonso Nishikawa
>Priority: Trivial
> Attachments: PIG-4213v1.patch
>
>
> Managing tweets information I found that someone wrote a multiline tweet in 
> Mac OS 9 (or bellow). When exporting the text, it is not being quoted so 
> LibreOffice can't import the cell properly (don't try Excel 2007 because it's 
> bugged).
> I suggest including the CR case in the same way as commented in 
> http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4213) CSVExcelStorage not quoting texts containing \r (CR) when storing

2014-09-30 Thread Alfonso Nishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153625#comment-14153625
 ] 

Alfonso Nishikawa commented on PIG-4213:


Errata from comment above: 'TestLoader' where should be 'TextLoader'.

> CSVExcelStorage not quoting texts containing \r (CR) when storing
> -
>
> Key: PIG-4213
> URL: https://issues.apache.org/jira/browse/PIG-4213
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.12.0
>Reporter: Alfonso Nishikawa
>Priority: Trivial
> Attachments: PIG-4213v1.patch
>
>
> Managing tweets information I found that someone wrote a multiline tweet in 
> Mac OS 9 (or bellow). When exporting the text is not bein quoted so 
> LibreOffice can't import the cell properly (don't try Excel 2007 because it's 
> bugged).
> I suggest including the CR case in the same way as commented in 
> http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4213) CSVExcelStorage not quoting texts containing \r (CR) when storing

2014-09-30 Thread Alfonso Nishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfonso Nishikawa updated PIG-4213:
---
Attachment: PIG-4213v1.patch

I upload a proposal patch (PIG-4213v1.patch). Tests included (I tried my best). 
I assume TestLoader does not break lines by CR (\r), but tests are prepared to 
expose me if I am wrong ;)
I couldn't test since I am not at home and this Celeron from 2005 in front of 
me can't handle the build...

> CSVExcelStorage not quoting texts containing \r (CR) when storing
> -
>
> Key: PIG-4213
> URL: https://issues.apache.org/jira/browse/PIG-4213
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.12.0
>Reporter: Alfonso Nishikawa
>Priority: Trivial
> Attachments: PIG-4213v1.patch
>
>
> Managing tweets information I found that someone wrote a multiline tweet in 
> Mac OS 9 (or bellow). When exporting the text is not bein quoted so 
> LibreOffice can't import the cell properly (don't try Excel 2007 because it's 
> bugged).
> I suggest including the CR case in the same way as commented in 
> http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4084) Port TestPigRunner to Tez

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4084:

Attachment: PIG-4084-1.patch

> Port TestPigRunner to Tez
> -
>
> Key: PIG-4084
> URL: https://issues.apache.org/jira/browse/PIG-4084
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4084-1.patch, initial.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4084) Port TestPigRunner to Tez

2014-09-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4084:

Status: Patch Available  (was: Open)

> Port TestPigRunner to Tez
> -
>
> Key: PIG-4084
> URL: https://issues.apache.org/jira/browse/PIG-4084
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4084-1.patch, initial.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4212) Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)

2014-09-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153495#comment-14153495
 ] 

Daniel Dai commented on PIG-4212:
-

+1

> Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)
> --
>
> Key: PIG-4212
> URL: https://issues.apache.org/jira/browse/PIG-4212
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-4212-v1.patch
>
>
> Somehow 
> limit A 0 
> is currently allowed but not
>limit A B.count - B.count 
> I'd like the latter to be also allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4213) CSVExcelStorage not quoting texts containing \r (CR) when storing

2014-09-30 Thread Alfonso Nishikawa (JIRA)
Alfonso Nishikawa created PIG-4213:
--

 Summary: CSVExcelStorage not quoting texts containing \r (CR) when 
storing
 Key: PIG-4213
 URL: https://issues.apache.org/jira/browse/PIG-4213
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Affects Versions: 0.12.0
Reporter: Alfonso Nishikawa
Priority: Trivial


Managing tweets information I found that someone wrote a multiline tweet in Mac 
OS 9 (or bellow). When exporting the text is not bein quoted so LibreOffice 
can't import the cell properly (don't try Excel 2007 because it's bugged).

I suggest including the CR case in the same way as commented in 
http://svn.apache.org/viewvc/pig/tags/release-0.12.1/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup#l315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4212) Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)

2014-09-30 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-4212:
--
Status: Patch Available  (was: Open)

> Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)
> --
>
> Key: PIG-4212
> URL: https://issues.apache.org/jira/browse/PIG-4212
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-4212-v1.patch
>
>
> Somehow 
> limit A 0 
> is currently allowed but not
>limit A B.count - B.count 
> I'd like the latter to be also allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4212) Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)

2014-09-30 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-4212:
--
Attachment: pig-4212-v1.patch

Straightforward patch that accepts 0 variablelimit.

> Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)
> --
>
> Key: PIG-4212
> URL: https://issues.apache.org/jira/browse/PIG-4212
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-4212-v1.patch
>
>
> Somehow 
> limit A 0 
> is currently allowed but not
>limit A B.count - B.count 
> I'd like the latter to be also allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4212) Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)

2014-09-30 Thread Koji Noguchi (JIRA)
Koji Noguchi created PIG-4212:
-

 Summary: Allow LIMIT of 0 for variableLimit (constant 0 is already 
allowed)
 Key: PIG-4212
 URL: https://issues.apache.org/jira/browse/PIG-4212
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Trivial


Somehow 
limit A 0 
is currently allowed but not
   limit A B.count - B.count 

I'd like the latter to be also allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4212) Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)

2014-09-30 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153286#comment-14153286
 ] 

Koji Noguchi commented on PIG-4212:
---

One of our users has a logic in their script that requires LIMIT on 0.

This user wants to store at most N outputs total from contentA and contentB.  
Giving preference on contentA. 
{noformat}
limitContentA = limit contentA N; 
groupedContent_A = GROUP limitContentA ALL;
limitContentB = limit contentA N - totalStoredContent_A.$0

... store both limitContentA and limitContentB
{noformat}


> Allow LIMIT of 0 for variableLimit (constant 0 is already allowed)
> --
>
> Key: PIG-4212
> URL: https://issues.apache.org/jira/browse/PIG-4212
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
>
> Somehow 
> limit A 0 
> is currently allowed but not
>limit A B.count - B.count 
> I'd like the latter to be also allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Pig on Spark - Suggestions in handling code changes out of Spark

2014-09-30 Thread Praveen R
*Hi Everyone,*

Earlier we have made some changes on
https://github.com/sigmoidanalytics/spork/tree/spork-pig-12 to achieve
complete e2e coverage but we couldn't restrict ourselves in making changes
in pig codebase as we found it slightly easier to do.

We are now working on merging these changes to
https://github.com/apache/pig/tree/spark and had to re-look into these
changes, either find a workaround or propose the change on trunk.

Below is the gist of code changes that are made out of Spark for which the
related code can be found here 


   1.

   Had to comment out PigStatsUtil.addNativeJobStats(PigStats.get(), this,
   true); to get native (mapred) operator working
   2.

   Changes in PigRecordReader to identify endOfAllInput
   3.

   POUserFunc - made properties attribute public
   4.

   POCollectedGroup - getNextTuple modified to identify the end of all input
   5.

   POFRJoin - made LRs attribute public to use it during FR join
   6.

   POMergeJoin - made LRs attribute public to use it during merge join
   7.

   POStream - problem with identifying endOfAllInput, made some changes
   8.

   JsonLoader - made properties public to use from JsonStorage
   9.

   JsonStorage - uses properties from JsonLoader
   10.

   PigStorage - mRequiredColumns attribute
   11.

   BinSedesTuple, BinSedesTupleFactory - made the class serializable
   12.

   SchemaTupleBackend - changes to initialize stbInstance when null



Would like to seek upfront suggestions before I submit the related patches
and take the discussion on a issue basis.

BW, below are the jira issues relating above changes which I would be
working on. Please feel free to comment on the issue whoever is interested
in taking them up.

PIG-4193, PIG-4189, PIG-4190, PIG-4192, PIG-4200, PIG-4207, PIG-4208,
PIG-4209

Thanks,
Praveen R


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Praveen Rachabattuni (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen Rachabattuni updated PIG-4173:
--
Attachment: TEST-org.apache.pig.spark.TestSpark.txt

Hi Richard, I just found some of unit tests from PIG-4168 doesn't pass after 
the Spark-1.x upgrade. Please check attached log.

I believe It would good to have this unit tests passing before the merge.

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, 
> TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4174) e2e tests for Spark

2014-09-30 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152897#comment-14152897
 ] 

liyunzhang_intel commented on PIG-4174:
---

If have hadoop variable in shell env.
#which hadoop
$HADOOP_HOME/bin/hadoop

You can do following running e2e tests:
1.  Generate data 
a)  ant  -Dharness.cluster.conf=$HADOOP_CONF_DIR 
-Dharness.cluster.bin=$HADOOP_BIN test-e2e-deploy-local  
b)  hadoop fs -put test/e2e/pig/testdist/data ./  
2.  Run particular tests
a)  Patch –p1 e2e tests for Spark
> ---
>
> Key: PIG-4174
> URL: https://issues.apache.org/jira/browse/PIG-4174
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
> Attachments: PIG-4174-1.patch, PIG-4174_1.patch, copy.data.to.hdfs.log
>
>
> Setup e2e tests for pig on spark like that pig on map-reduce and pig on tez.
> Steps to setup e2e tests:
> 1. Initialize Variables
> export OLD_PIG_HOME=/usr/local/Cellar/pig/0.12.0 # Should rather be 14
> export HADOOP_CONF_DIR=/usr/local/Cellar/hadoop/1.0.4/conf
> export HADOOP_BIN=/usr/local/Cellar/hadoop/1.0.4/bin/hadoop
> 2. Generate Data
> ant -Dharness.old.pig=$OLD_PIG_HOME -Dharness.cluster.conf=$HADOOP_CONF_DIR 
> -Dharness.cluster.bin=$HADOOP_BIN test-e2e-deploy-local
> (You might want to install necessary cpan modules incase of any dependency 
> errors 
> https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-End-to-endTesting)
> Copy data to hdfs to use with Spark
> hadoop fs -put test/e2e/pig/testdist/data ./
> 3. Run particular test
> ant -Dharness.old.pig=$OLD_PIG_HOME -Dharness.cluster.conf=$HADOOP_CONF_DIR 
> -Dharness.cluster.bin=$HADOOP_BIN -Dtests.to.run="-t Checkin_1" test-e2e-spark
> 4. Run all tests
> ant -Dharness.old.pig=$OLD_PIG_HOME -Dharness.cluster.conf=$HADOOP_CONF_DIR 
> -Dharness.cluster.bin=$HADOOP_BIN test-e2e-spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4174) e2e tests for Spark

2014-09-30 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4174:
--
Attachment: PIG-4174_1.patch

> e2e tests for Spark
> ---
>
> Key: PIG-4174
> URL: https://issues.apache.org/jira/browse/PIG-4174
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
> Attachments: PIG-4174-1.patch, PIG-4174_1.patch, copy.data.to.hdfs.log
>
>
> Setup e2e tests for pig on spark like that pig on map-reduce and pig on tez.
> Steps to setup e2e tests:
> 1. Initialize Variables
> export OLD_PIG_HOME=/usr/local/Cellar/pig/0.12.0 # Should rather be 14
> export HADOOP_CONF_DIR=/usr/local/Cellar/hadoop/1.0.4/conf
> export HADOOP_BIN=/usr/local/Cellar/hadoop/1.0.4/bin/hadoop
> 2. Generate Data
> ant -Dharness.old.pig=$OLD_PIG_HOME -Dharness.cluster.conf=$HADOOP_CONF_DIR 
> -Dharness.cluster.bin=$HADOOP_BIN test-e2e-deploy-local
> (You might want to install necessary cpan modules incase of any dependency 
> errors 
> https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-End-to-endTesting)
> Copy data to hdfs to use with Spark
> hadoop fs -put test/e2e/pig/testdist/data ./
> 3. Run particular test
> ant -Dharness.old.pig=$OLD_PIG_HOME -Dharness.cluster.conf=$HADOOP_CONF_DIR 
> -Dharness.cluster.bin=$HADOOP_BIN -Dtests.to.run="-t Checkin_1" test-e2e-spark
> 4. Run all tests
> ant -Dharness.old.pig=$OLD_PIG_HOME -Dharness.cluster.conf=$HADOOP_CONF_DIR 
> -Dharness.cluster.bin=$HADOOP_BIN test-e2e-spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)