Re: Run a job async

2013-01-25 Thread Cheolsoo Park
Thank you for the suggestions. I will file a jira and add our discussion
there.


On Fri, Jan 25, 2013 at 4:23 PM, Rohini Palaniswamy  wrote:

> Jon,
>   Those are good areas to check. Few things I have seen regarding those are
>
>  1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
> multiple runs if the script names are same (hit this issue in PIG-2433 unit
> tests).
>  2) QueryParserDriver - There is a static cache with macro name to macro
> file mapping. So same macro names with different file locations will cause
> problems.
>  3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to
> reinitialize if supporting Multiple clusters.
>
> Regards,
> Rohini
>
>
> On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney  >wrote:
>
> > user to bcc, +dev
> >
> > Cheolsoo,
> >
> > Can you make a JIRA for this? I can imagine a slightly heavier test
> suite,
> > but I like where you started. If it's not far off, then I think it'll be
> a
> > win to make it thread safe. But we need to make sure to test the most
> > advanced features...UDF's (esp the same name but different udf in
> different
> > invocations), scripting UDFs (same thing), and so on.
> >
> >
> > 2013/1/25 Cheolsoo Park 
> >
> > > >> if you have multiple threads that run a query via PigServer, there
> is
> > a
> > > great chance of the internals clashing because of the use of static
> > > variable within Pig.
> > >
> > > Recently, I spent some time on this, and what I found is that the Pig
> > > front-end is quite thread-safe. Here is how I tested it:
> > >
> > > 1) Wrote a PigUnit test that runs in MR mode.
> > > 2) Executed test cases concurrently in 4 threads using a JUnit
> extension
> > > called temps-fugit:
> > > http://tempusfugitlibrary.org/documentation/junit/parallel/
> > >
> > > After fixing PIG-3096, I was able to successfully run Pig queries in
> > > parallel. It's important to note that only the front-end needs to be
> > > thread-safe since that's what is executed in parallel.
> > >
> > > I arbitrarily selected queries from e2e test cases, so they are
> probably
> > > not complex enough to mimic real-world examples. Nevertheless, my test
> > > program ran without a problem for few days. I couldn't continue my
> > > experiment because I was pulled out into something else. However, I
> think
> > > that making the front-end thread-safe is an achievable goal.
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > >
> > >
> > > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> > > wrote:
> > >
> > > > That clarifies it for me, thanks a lot.
> > > >
> > > > Regards,
> > > > Rama.
> > > >
> > > >
> > > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <
> jcove...@gmail.com
> > > > >wrote:
> > > >
> > > > > Well, when I say that Pig is not multi-threaded, what I mean is
> that
> > if
> > > > you
> > > > > have multiple threads that run a query via PigServer, there is a
> > great
> > > > > chance of the internals clashing because of the use of static
> > variables
> > > > > within Pig. Pig itself, when running a single query, is
> > multi-threaded.
> > > > > It's just not "multi-threaded" in the sense that multiple instances
> > can
> > > > > safely be run in the same JVM.
> > > > >
> > > > >
> > > > > 2013/1/24 Ramakrishna Nalam 
> > > > >
> > > > > > Hi Jonathan,
> > > > > >
> > > > > > Pardon if it's a naive question, but Interesting that you say Pig
> > is
> > > > not
> > > > > > multithreaded.
> > > > > > We're using Pig 0.10.0, and looking at the code, it seems to do
> the
> > > > right
> > > > > > things to handle multi threaded requests (ThreadLocal for
> > ScriptState
> > > > for
> > > > > > eg).
> > > > > >
> > > > > > Would be great if you can point out to the kind of issues there
> > could
> > > > be.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Rama.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <
> > lefthandma...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Are there any plans on making the pigserver multi-threaded?
> > > > > > >
> > > > > > > since there is "PigProcessNotificationListener" to subscribe
> for
> > > > async
> > > > > > > callbacks when the pig job completes, is there any real need to
> > > keep
> > > > > the
> > > > > > > pig job submitting thread waiting until the job completes?
> > > > > > >
> > > > > > > Is this just a shortcoming today or are there more concrete
> > reasons
> > > > > > against
> > > > > > > providing with a pigserver which can submit to the cluster in
> > > > mapreduce
> > > > > > > mode async?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Praveen
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > > > jcove...@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > I think whatever way you slice it, handling thousands of pig
> > jobs
> > > > > > > > asynchronously is going to be a bear. I mean, this is
> > essentially
> > > 

[jira] [Updated] (PIG-3138) Decouple PigServer.executeBatch() from compilation of batch

2013-01-25 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3138:
-

Attachment: PIG-3138.patch

Adding a patch, please review.

> Decouple PigServer.executeBatch() from compilation of batch
> ---
>
> Key: PIG-3138
> URL: https://issues.apache.org/jira/browse/PIG-3138
> Project: Pig
>  Issue Type: Improvement
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: PIG-3138.patch
>
>
> executeBatch() currently does parsing and building of LogicalPlan in addition 
> to the actual execution. It will be beneficial to separate out 
> parsing/building from execution - that will allow us to get a handle on 
> load/store and other operators before execution of batch. Useful for folks 
> using PigServer API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3138) Decouple PigServer.executeBatch() from compilation of batch

2013-01-25 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi reassigned PIG-3138:


Assignee: Prashant Kommireddi

> Decouple PigServer.executeBatch() from compilation of batch
> ---
>
> Key: PIG-3138
> URL: https://issues.apache.org/jira/browse/PIG-3138
> Project: Pig
>  Issue Type: Improvement
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
>
> executeBatch() currently does parsing and building of LogicalPlan in addition 
> to the actual execution. It will be beneficial to separate out 
> parsing/building from execution - that will allow us to get a handle on 
> load/store and other operators before execution of batch. Useful for folks 
> using PigServer API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3138) Decouple PigServer.executeBatch() from compilation of batch

2013-01-25 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-3138:


 Summary: Decouple PigServer.executeBatch() from compilation of 
batch
 Key: PIG-3138
 URL: https://issues.apache.org/jira/browse/PIG-3138
 Project: Pig
  Issue Type: Improvement
Reporter: Prashant Kommireddi
 Fix For: 0.12


executeBatch() currently does parsing and building of LogicalPlan in addition 
to the actual execution. It will be beneficial to separate out parsing/building 
from execution - that will allow us to get a handle on load/store and other 
operators before execution of batch. Useful for folks using PigServer API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3137) fix Piggybank test to not using /tmp dir

2013-01-25 Thread Johnny Zhang (JIRA)
Johnny Zhang created PIG-3137:
-

 Summary: fix Piggybank test to not using /tmp dir
 Key: PIG-3137
 URL: https://issues.apache.org/jira/browse/PIG-3137
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11
Reporter: Johnny Zhang
 Fix For: 0.12


right now several Piggybank tests create directory under /tmp to store test 
data, the test could fail because user doesn't have permission to create 
directory under /tmp. It is better to move test data dir under build dir to 
avoid this problem.

I will submit a patch soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig 11.0

2013-01-25 Thread Cheolsoo Park
I will also run e2e on Hadoop-1.x and Hadoop-2.x.


On Fri, Jan 25, 2013 at 5:02 PM, Daniel Dai  wrote:

> I will run e2e tests on Hadoop 1.x over the weekend.
>
> Thanks,
> Daniel
>
> On Fri, Jan 25, 2013 at 4:27 PM, Rohini Palaniswamy
>  wrote:
> >  Thats good :). Unit tests have all been passing. I haven't run e2e tests
> > on pig 0.11 for sometime. Will kick off one this weekend. It would be
> nice
> > if Cheolsoo and Daniel can also kick off one run.
> >
> > Regards,
> > Rohini
> >
> >
> > On Fri, Jan 25, 2013 at 4:08 PM, Julien Le Dem 
> wrote:
> >
> >> It looks like all the tickets for Pig 0.11 have been resolved as of
> today.
> >> See:
> >>
> >>
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC
> >>
> >> I propose we make the release 0.11.0 next week.
> >>
> >> Julien
> >>
>


Re: Pig 11.0

2013-01-25 Thread Daniel Dai
I will run e2e tests on Hadoop 1.x over the weekend.

Thanks,
Daniel

On Fri, Jan 25, 2013 at 4:27 PM, Rohini Palaniswamy
 wrote:
>  Thats good :). Unit tests have all been passing. I haven't run e2e tests
> on pig 0.11 for sometime. Will kick off one this weekend. It would be nice
> if Cheolsoo and Daniel can also kick off one run.
>
> Regards,
> Rohini
>
>
> On Fri, Jan 25, 2013 at 4:08 PM, Julien Le Dem  wrote:
>
>> It looks like all the tickets for Pig 0.11 have been resolved as of today.
>> See:
>>
>> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC
>>
>> I propose we make the release 0.11.0 next week.
>>
>> Julien
>>


[jira] Subscription: PIG patch available

2013-01-25 Thread jira
Issue Subscription
Filter: PIG patch available (23 issues)

Subscriber: pigdaily

Key Summary
PIG-3136Introduce a syntax making declared aliases optional
https://issues.apache.org/jira/browse/PIG-3136
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3122Operators should not implicitly become reserved keywords
https://issues.apache.org/jira/browse/PIG-3122
PIG-3114Duplicated macro name error when using pigunit
https://issues.apache.org/jira/browse/PIG-3114
PIG-3108HBaseStorage returns empty maps when mixing wildcard- with other 
columns
https://issues.apache.org/jira/browse/PIG-3108
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3098Add another test for the self join case
https://issues.apache.org/jira/browse/PIG-3098
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1237Piggybank MutliStorage - specify field to write in output
https://issues.apache.org/jira/browse/PIG-1237

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


Re: Pig 11.0

2013-01-25 Thread Rohini Palaniswamy
 Thats good :). Unit tests have all been passing. I haven't run e2e tests
on pig 0.11 for sometime. Will kick off one this weekend. It would be nice
if Cheolsoo and Daniel can also kick off one run.

Regards,
Rohini


On Fri, Jan 25, 2013 at 4:08 PM, Julien Le Dem  wrote:

> It looks like all the tickets for Pig 0.11 have been resolved as of today.
> See:
>
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC
>
> I propose we make the release 0.11.0 next week.
>
> Julien
>


Re: Run a job async

2013-01-25 Thread Rohini Palaniswamy
Jon,
  Those are good areas to check. Few things I have seen regarding those are

 1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
multiple runs if the script names are same (hit this issue in PIG-2433 unit
tests).
 2) QueryParserDriver - There is a static cache with macro name to macro
file mapping. So same macro names with different file locations will cause
problems.
 3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to
reinitialize if supporting Multiple clusters.

Regards,
Rohini


On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney wrote:

> user to bcc, +dev
>
> Cheolsoo,
>
> Can you make a JIRA for this? I can imagine a slightly heavier test suite,
> but I like where you started. If it's not far off, then I think it'll be a
> win to make it thread safe. But we need to make sure to test the most
> advanced features...UDF's (esp the same name but different udf in different
> invocations), scripting UDFs (same thing), and so on.
>
>
> 2013/1/25 Cheolsoo Park 
>
> > >> if you have multiple threads that run a query via PigServer, there is
> a
> > great chance of the internals clashing because of the use of static
> > variable within Pig.
> >
> > Recently, I spent some time on this, and what I found is that the Pig
> > front-end is quite thread-safe. Here is how I tested it:
> >
> > 1) Wrote a PigUnit test that runs in MR mode.
> > 2) Executed test cases concurrently in 4 threads using a JUnit extension
> > called temps-fugit:
> > http://tempusfugitlibrary.org/documentation/junit/parallel/
> >
> > After fixing PIG-3096, I was able to successfully run Pig queries in
> > parallel. It's important to note that only the front-end needs to be
> > thread-safe since that's what is executed in parallel.
> >
> > I arbitrarily selected queries from e2e test cases, so they are probably
> > not complex enough to mimic real-world examples. Nevertheless, my test
> > program ran without a problem for few days. I couldn't continue my
> > experiment because I was pulled out into something else. However, I think
> > that making the front-end thread-safe is an achievable goal.
> >
> > Thanks,
> > Cheolsoo
> >
> >
> >
> > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> > wrote:
> >
> > > That clarifies it for me, thanks a lot.
> > >
> > > Regards,
> > > Rama.
> > >
> > >
> > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney  > > >wrote:
> > >
> > > > Well, when I say that Pig is not multi-threaded, what I mean is that
> if
> > > you
> > > > have multiple threads that run a query via PigServer, there is a
> great
> > > > chance of the internals clashing because of the use of static
> variables
> > > > within Pig. Pig itself, when running a single query, is
> multi-threaded.
> > > > It's just not "multi-threaded" in the sense that multiple instances
> can
> > > > safely be run in the same JVM.
> > > >
> > > >
> > > > 2013/1/24 Ramakrishna Nalam 
> > > >
> > > > > Hi Jonathan,
> > > > >
> > > > > Pardon if it's a naive question, but Interesting that you say Pig
> is
> > > not
> > > > > multithreaded.
> > > > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> > > right
> > > > > things to handle multi threaded requests (ThreadLocal for
> ScriptState
> > > for
> > > > > eg).
> > > > >
> > > > > Would be great if you can point out to the kind of issues there
> could
> > > be.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Rama.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <
> lefthandma...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Are there any plans on making the pigserver multi-threaded?
> > > > > >
> > > > > > since there is "PigProcessNotificationListener" to subscribe for
> > > async
> > > > > > callbacks when the pig job completes, is there any real need to
> > keep
> > > > the
> > > > > > pig job submitting thread waiting until the job completes?
> > > > > >
> > > > > > Is this just a shortcoming today or are there more concrete
> reasons
> > > > > against
> > > > > > providing with a pigserver which can submit to the cluster in
> > > mapreduce
> > > > > > mode async?
> > > > > >
> > > > > > Thanks,
> > > > > > Praveen
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > > jcove...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I think whatever way you slice it, handling thousands of pig
> jobs
> > > > > > > asynchronously is going to be a bear. I mean, this is
> essentially
> > > > what
> > > > > > the
> > > > > > > job tracker does, albeit with a lot less information.
> > > > > > >
> > > > > > > Either way, Pig is not multi-threaded so having more than one
> > > > instance
> > > > > of
> > > > > > > Pig in the same JVM is going to start causing problems (which
> is
> > > > why, I
> > > > > > > imagine, there is no async way to call Pig). So multiple
> > processes
> > > is
> > > > > > > really the only way around it that I know of.
> > > > >

Pig 11.0

2013-01-25 Thread Julien Le Dem
It looks like all the tickets for Pig 0.11 have been resolved as of today.
See:
https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC

I propose we make the release 0.11.0 next week.

Julien


Build failed in Jenkins: Pig-trunk #1394

2013-01-25 Thread Apache Jenkins Server
See 

Changes:

[jcoveney] PIG-2764: Add a biginteger and bigdecimal type to pig (jcoveney)

[gates] PIG-2645 PigSplit does not handle the case where SerializationFactory 
returns null

--
[...truncated 2580 lines...]
[ivy:resolve]   found tomcat#jasper-compiler;5.5.12 in default
[ivy:resolve]   found org.mortbay.jetty#jsp-api-2.1;6.1.14 in default
[ivy:resolve]   found org.mortbay.jetty#servlet-api-2.5;6.1.14 in fs
[ivy:resolve]   found org.mortbay.jetty#jsp-2.1;6.1.14 in default
[ivy:resolve]   found org.eclipse.jdt#core;3.1.1 in fs
[ivy:resolve]   found ant#ant;1.6.5 in fs
[ivy:resolve]   found commons-el#commons-el;1.0 in fs
[ivy:resolve]   found net.java.dev.jets3t#jets3t;0.7.1 in fs
[ivy:resolve]   found net.sf.kosmosfs#kfs;0.3 in fs
[ivy:resolve]   found hsqldb#hsqldb;1.8.0.10 in fs
[ivy:resolve]   found org.apache.hadoop#hadoop-test;1.0.0 in maven2
[ivy:resolve]   found org.apache.ftpserver#ftplet-api;1.0.0 in fs
[ivy:resolve]   found org.apache.mina#mina-core;2.0.0-M5 in fs
[ivy:resolve]   found org.slf4j#slf4j-api;1.5.2 in fs
[ivy:resolve]   found org.apache.ftpserver#ftpserver-core;1.0.0 in fs
[ivy:resolve]   found org.apache.ftpserver#ftpserver-deprecated;1.0.0-M2 in fs
[ivy:resolve]   found commons-io#commons-io;2.3 in maven2
[ivy:resolve]   found org.apache.httpcomponents#httpclient;4.1 in maven2
[ivy:resolve]   found org.apache.httpcomponents#httpcore;4.1 in maven2
[ivy:resolve]   found log4j#log4j;1.2.16 in fs
[ivy:resolve]   found org.slf4j#slf4j-log4j12;1.6.1 in fs
[ivy:resolve]   found org.apache.avro#avro;1.5.3 in fs
[ivy:resolve]   found com.thoughtworks.paranamer#paranamer;2.3 in fs
[ivy:resolve]   found org.xerial.snappy#snappy-java;1.0.3.2 in fs
[ivy:resolve]   found org.slf4j#slf4j-api;1.6.1 in fs
[ivy:resolve]   found com.googlecode.json-simple#json-simple;1.1 in fs
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in fs
[ivy:resolve]   found jline#jline;0.9.94 in fs
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found org.codehaus.groovy#groovy-all;1.8.6 in maven2
[ivy:resolve]   found org.codehaus.jackson#jackson-mapper-asl;1.8.8 in fs
[ivy:resolve]   found org.codehaus.jackson#jackson-core-asl;1.8.8 in fs
[ivy:resolve]   found org.fusesource.jansi#jansi;1.9 in maven2
[ivy:resolve]   found joda-time#joda-time;2.1 in maven2
[ivy:resolve]   found com.google.guava#guava;11.0 in maven2
[ivy:resolve]   found org.python#jython-standalone;2.5.2 in maven2
[ivy:resolve]   found rhino#js;1.7R2 in maven2
[ivy:resolve]   found org.antlr#antlr;3.4 in maven2
[ivy:resolve]   found org.antlr#antlr-runtime;3.4 in maven2
[ivy:resolve]   found org.antlr#stringtemplate;3.2.1 in maven2
[ivy:resolve]   found antlr#antlr;2.7.7 in fs
[ivy:resolve]   found org.antlr#ST4;4.0.4 in maven2
[ivy:resolve]   found org.apache.zookeeper#zookeeper;3.4.4 in maven2
[ivy:resolve]   found dk.brics.automaton#automaton;1.11-8 in maven2
[ivy:resolve]   found org.jruby#jruby-complete;1.6.7 in maven2
[ivy:resolve]   found org.apache.hbase#hbase;0.94.1 in maven2
[ivy:resolve]   found org.vafer#jdeb;0.8 in maven2
[ivy:resolve]   found org.mockito#mockito-all;1.8.4 in maven2
[ivy:resolve]   found xalan#xalan;2.7.1 in maven2
[ivy:resolve]   found xalan#serializer;2.7.1 in maven2
[ivy:resolve]   found xml-apis#xml-apis;1.3.04 in fs
[ivy:resolve]   found xerces#xercesImpl;2.10.0 in maven2
[ivy:resolve]   found xml-apis#xml-apis;1.4.01 in maven2
[ivy:resolve]   found junit#junit;4.11 in maven2
[ivy:resolve]   found org.hamcrest#hamcrest-core;1.3 in maven2
[ivy:resolve]   found org.jboss.netty#netty;3.2.2.Final in fs
[ivy:resolve]   found com.github.stephenc.high-scale-lib#high-scale-lib;1.1.1 
in fs
[ivy:resolve]   found com.google.protobuf#protobuf-java;2.4.0a in fs
[ivy:resolve]   found com.yammer.metrics#metrics-core;2.1.2 in fs
[ivy:resolve]   found org.slf4j#slf4j-api;1.6.4 in fs
[ivy:resolve]   found org.apache.hive#hive-exec;0.8.0 in maven2
[ivy:resolve]   found junit#junit;3.8.1 in fs
[ivy:resolve]   found com.google.code.p.arat#rat-lib;0.5.1 in maven2
[ivy:resolve]   found commons-collections#commons-collections;3.2 in fs
[ivy:resolve]   found commons-lang#commons-lang;2.1 in fs
[ivy:resolve]   found jdiff#jdiff;1.0.9 in fs
[ivy:resolve]   found checkstyle#checkstyle;4.2 in maven2
[ivy:resolve]   found commons-beanutils#commons-beanutils-core;1.7.0 in fs
[ivy:resolve]   found commons-cli#commons-cli;1.0 in fs
[ivy:resolve]   found commons-logging#commons-logging;1.0.3 in fs
[ivy:resolve]   found org.codehaus.jackson#jackson-mapper-asl;1.0.1 in fs
[ivy:resolve]   found org.codehaus.jackson#jackson-core-asl;1.0.1 in fs
[ivy:resolve]   found com.sun.jersey#jersey-bundle;1.8 in maven2
[ivy:resolve]   found com.sun.jersey#jersey-server;1.8 in fs
[ivy:resolve]   found com.sun.jersey.contribs#jersey-guice;1.8 in fs
[ivy:resolve]   found commons-httpclient#commons-httpclient;3.1 in fs
[ivy:resolve]   found ja

[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig

2013-01-25 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563006#comment-13563006
 ] 

Russell Jurney commented on PIG-2764:
-

Reviewed, albeit late. There was a typo, and some lint questions.

> Add a biginteger and bigdecimal type to pig
> ---
>
> Key: PIG-2764
> URL: https://issues.apache.org/jira/browse/PIG-2764
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch, 
> PIG-2764-2_nows.patch, PIG-2764-2.patch, PIG-2764-3.patch, PIG-2764-4.patch, 
> PIG-2764-5.patch, PIG-2764-6.patch
>
>
> I think it would be useful for applications where precision is more important 
> than speed to have the option of using java's bigdecimal and biginteger types 
> natively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: Add BigInteger and BigDecimal to Pig

2013-01-25 Thread Russell Jurney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9012/#review15524
---


src/org/apache/pig/backend/hadoop/BigDecimalWritable.java:70: spelling error in 
TODO big/bit

There seem to be lots of LINT changes - not keeping tabs for newline only 
lines. Not sure if that is right? Plz check LINT.



- Russell Jurney


On Jan. 22, 2013, 10:05 p.m., Jonathan Coveney wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9012/
> ---
> 
> (Updated Jan. 22, 2013, 10:05 p.m.)
> 
> 
> Review request for pig, Alan Gates and Mathias Herberts.
> 
> 
> Description
> ---
> 
> This patch adds big integer and big decimal support to Pig. It could use more 
> tests, something I'd appreciate feedback on (but I wanted to make sure the 
> core implementation is good)
> 
> 
> This addresses bug PIG-2764.
> https://issues.apache.org/jira/browse/PIG-2764
> 
> 
> Diffs
> -
> 
>   .gitignore cc62d7d 
>   src/org/apache/pig/LoadCaster.java 574769b 
>   src/org/apache/pig/PigWarning.java 5de075f 
>   src/org/apache/pig/StoreCaster.java 5fe48de 
>   src/org/apache/pig/backend/hadoop/BigDecimalWritable.java PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/BigIntegerWritable.java PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/HDataType.java 84a56b8 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  96fba6b 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigDecimalRawComparator.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigIntegerRawComparator.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
>  9749339 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
>  f40eb43 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Add.java
>  c84b767 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java
>  db3840f 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Divide.java
>  4656c28 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java
>  6683beb 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java
>  2806336 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java
>  d64a080 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java
>  704d0b8 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java
>  9dc929e 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java
>  0320698 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Mod.java
>  6819185 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Multiply.java
>  7b57bed 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java
>  79a4461 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java
>  08544d5 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java
>  e8c2f2c 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POIsNull.java
>  f20b839 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PONegative.java
>  c076ae7 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java
>  8887133 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserComparisonFunc.java
>  479eb83 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
>  3c7e741 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/Subtract.java
>  79d4c73 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java
>  bf2ba08 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java
>  ddb25f1 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java
>  aa11409 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relat

Re: Got a build error: missing required library: 'build/ivy/lib/Pig/javacc-4.2.jar'

2013-01-25 Thread Alan Gates
javacc is still used in Pig trunk.  The main parser was replaced by ANTLR in 
0.9, but there are still several places Pig uses javacc.

Alan.

On Jan 25, 2013, at 11:57 AM, Kyungho Jeon wrote:

> Hello,
> 
> I just cloned Pig svn repository and imported into Eclipse.
> Immediately I got an error as the title.
> 
> I found from jira that javacc will be removed from Pig 0.9. But I also
> found many dependencies on javacc (cc-compile) in build.xml. So is it
> still there for some reasons or just because that they are not yet
> cleaned up?
> 
> Thanks,
> Kyungho.
> 
> (It's my first e-mail to any kind of dev mailing list. So if it's not
> an adequate way of reporting a problem and/or asking a question,
> please let me know and fix them.)



[jira] [Updated] (PIG-2764) Add a biginteger and bigdecimal type to pig

2013-01-25 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2764:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

It's in! Thank you all for the eyes.

> Add a biginteger and bigdecimal type to pig
> ---
>
> Key: PIG-2764
> URL: https://issues.apache.org/jira/browse/PIG-2764
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch, 
> PIG-2764-2_nows.patch, PIG-2764-2.patch, PIG-2764-3.patch, PIG-2764-4.patch, 
> PIG-2764-5.patch, PIG-2764-6.patch
>
>
> I think it would be useful for applications where precision is more important 
> than speed to have the option of using java's bigdecimal and biginteger types 
> natively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2764) Add a biginteger and bigdecimal type to pig

2013-01-25 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2764:
--

Attachment: PIG-2764-6.patch

This is the patch I applied to trunk (there were some very minor merge 
conflicts, is all)

> Add a biginteger and bigdecimal type to pig
> ---
>
> Key: PIG-2764
> URL: https://issues.apache.org/jira/browse/PIG-2764
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch, 
> PIG-2764-2_nows.patch, PIG-2764-2.patch, PIG-2764-3.patch, PIG-2764-4.patch, 
> PIG-2764-5.patch, PIG-2764-6.patch
>
>
> I think it would be useful for applications where precision is more important 
> than speed to have the option of using java's bigdecimal and biginteger types 
> natively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Got a build error: missing required library: 'build/ivy/lib/Pig/javacc-4.2.jar'

2013-01-25 Thread Kyungho Jeon
Hello,

I just cloned Pig svn repository and imported into Eclipse.
Immediately I got an error as the title.

I found from jira that javacc will be removed from Pig 0.9. But I also
found many dependencies on javacc (cc-compile) in build.xml. So is it
still there for some reasons or just because that they are not yet
cleaned up?

Thanks,
Kyungho.

(It's my first e-mail to any kind of dev mailing list. So if it's not
an adequate way of reporting a problem and/or asking a question,
please let me know and fix them.)


[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig

2013-01-25 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562926#comment-13562926
 ] 

Alan Gates commented on PIG-2764:
-

Sorry, didn't realize you were waiting for me.  +1, since you addressed all my 
comments.  By the way, thanks for doing this.  I think it's huge for Pig.

> Add a biginteger and bigdecimal type to pig
> ---
>
> Key: PIG-2764
> URL: https://issues.apache.org/jira/browse/PIG-2764
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch, 
> PIG-2764-2_nows.patch, PIG-2764-2.patch, PIG-2764-3.patch, PIG-2764-4.patch, 
> PIG-2764-5.patch
>
>
> I think it would be useful for applications where precision is more important 
> than speed to have the option of using java's bigdecimal and biginteger types 
> natively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2417:


Status: Open  (was: Patch Available)

Patch no longer applies cleanly to trunk.

> Streaming UDFs -  allow users to easily write UDFs in scripting languages 
> with no JVM implementation.
> -
>
> Key: PIG-2417
> URL: https://issues.apache.org/jira/browse/PIG-2417
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11
>Reporter: Jeremy Karn
>Assignee: Jeremy Karn
> Attachments: streaming2.patch, streaming3.patch, streaming.patch
>
>
> The goal of Streaming UDFs is to allow users to easily write UDFs in 
> scripting languages with no JVM implementation or a limited JVM 
> implementation.  The initial proposal is outlined here: 
> https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs.
> In order to implement this we need new syntax to distinguish a streaming UDF 
> from an embedded JVM UDF.  I'd propose something like the following (although 
> I'm not sure 'language' is the best term to be using):
> {code}define my_streaming_udfs language('python') 
> ship('my_streaming_udfs.py'){code}
> We'll also need a language-specific controller script that gets shipped to 
> the cluster which is responsible for reading the input stream, deserializing 
> the input data, passing it to the user written script, serializing that 
> script output, and writing that to the output stream.
> Finally, we'll need to add a StreamingUDF class that extends evalFunc.  This 
> class will likely share some of the existing code in POStream and 
> ExecutableManager (where it make sense to pull out shared code) to stream 
> data to/from the controller script.
> One alternative approach to creating the StreamingUDF EvalFunc is to use the 
> POStream operator directly.  This would involve inserting the POStream 
> operator instead of the POUserFunc operator whenever we encountered a 
> streaming UDF while building the physical plan.  This approach seemed 
> problematic because there would need to be a lot of changes in order to 
> support POStream in all of the places we want to be able use UDFs (For 
> example - to operate on a single field inside of a for each statement).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2507) Semicolon in paramenters for UDF results in parsing error

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2507:


Status: Open  (was: Patch Available)

Changes to the code look fine, but we definitely need a unit test to check that 
they work.  Adding it in TestGrunt as Rohini suggested makes sense.  Canceling 
the patch pending adding of tests.

> Semicolon in paramenters for UDF results in parsing error
> -
>
> Key: PIG-2507
> URL: https://issues.apache.org/jira/browse/PIG-2507
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.9.1, 0.8.0
>Reporter: Vivek Padmanabhan
>Assignee: Timothy Chen
> Attachments: PIG_2507.patch
>
>
> If I have a semicolon in the parameter passed to a udf, the script execution 
> will fail with a parsing error.
> a = load 'i1' as (f1:chararray);
> c = foreach a generate REGEX_EXTRACT(f1, '.;' ,1);
> dump c;
> The above script fails with the below error 
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:  line 3, column 0>  mismatched character '' expecting '''
> Even replacing the semicolon with Unicode \u003B results in the same error.
> c = foreach a generate REGEX_EXTRACT(f1, '.\u003B',1);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2312) NPE when relation and column share the same name and used in Nested Foreach

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2312:


Status: Open  (was: Patch Available)

Latest patch no longer applies to trunk.

> NPE when relation and column share the same name and used in Nested Foreach 
> 
>
> Key: PIG-2312
> URL: https://issues.apache.org/jira/browse/PIG-2312
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Vivek Padmanabhan
>Assignee: Vivek Padmanabhan
> Attachments: PIG-2312_1.patch, PIG-2312_2.patch, PIG-2312_3.patch
>
>
> With Pig0.9, if a relation and a column has the same name and if the column 
> is used in a nested foreach, the script execution fails 
> while compiling.
> The below is the trace;
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.pig.newplan.logical.visitor.ScalarVisitor$1.visit(ScalarVisitor.java:63)
>   at 
> org.apache.pig.newplan.logical.expression.ScalarExpression.accept(ScalarExpression.java:109)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>   at 
> org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:142)
>   at 
> org.apache.pig.newplan.logical.relational.LOSort.accept(LOSort.java:119)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at 
> org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:104)
>   at 
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1674)
>   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1666)
>   at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1391)
>   at org.apache.pig.PigServer.execute(PigServer.java:1293)
>   at org.apache.pig.PigServer.executeBatch(PigServer.java:359)
>   at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:131)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:192)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
>   at org.apache.pig.Main.run(Main.java:553)
>   at org.apache.pig.Main.main(Main.java:108)
> {code}
> This could be reproduced with the below script 
> {code}
> f3 = load 'input.txt' as (a1:chararray);
> A = load '3char_1long_tab' as (f1:chararray, f2:chararray, 
> f3:chararray,ct:long);
> B = GROUP A  BY f1;
> C =FOREACH B {
> zip_ordered = ORDER A BY f3 ASC; 
> GENERATE
> FLATTEN(group) AS f1, 
> A.(f3, ct),
>   COUNT(zip_ordered),
> SUM(A.ct) AS total;
>   };
> STORE C INTO 'deletemeanytimeplease';
> {code}
> Checked with a unit test in trunk, the behavior is still same. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2645) PigSplit does not handle the case where SerializationFactory returns null

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2645:


   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Fix checked into trunk and branch.  Thanks Shami.

> PigSplit does not handle the case where SerializationFactory returns null
> -
>
> Key: PIG-2645
> URL: https://issues.apache.org/jira/browse/PIG-2645
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Alex Levenson
>Assignee: Shami B
>  Labels: patch
> Fix For: 0.11
>
> Attachments: patch_2645.patch, PIG-2645.patch
>
>
> In PigSplit.java, line 254:
> {code}
> SerializationFactory sf = new SerializationFactory(conf);
> Serializer s = sf.getSerializer(wrappedSplits[0].getClass());
> s.open((OutputStream) os);
> {code}
> sf.getSerializer returns null when it cannot find a serializer for a given 
> object. Instead of handling this properly, a NPE is thrown when s.open() is 
> called.
> This is easy to encounter when creating a custom InputSplit from the 
> mapreduce package which is an abstract class that DOES NOT implement Writable.
> However it's easy to miss because InputSplit from the mapred package is an 
> interface that extends Writable, and InputSplits often both extend and 
> implement both the new and old InputSplit abstract class and interface 
> (thereby becoming Writable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2645) PigSplit does not handle the case where SerializationFactory returns null

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2645:


Assignee: Shami B

> PigSplit does not handle the case where SerializationFactory returns null
> -
>
> Key: PIG-2645
> URL: https://issues.apache.org/jira/browse/PIG-2645
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Alex Levenson
>Assignee: Shami B
>  Labels: patch
> Attachments: patch_2645.patch, PIG-2645.patch
>
>
> In PigSplit.java, line 254:
> {code}
> SerializationFactory sf = new SerializationFactory(conf);
> Serializer s = sf.getSerializer(wrappedSplits[0].getClass());
> s.open((OutputStream) os);
> {code}
> sf.getSerializer returns null when it cannot find a serializer for a given 
> object. Instead of handling this properly, a NPE is thrown when s.open() is 
> called.
> This is easy to encounter when creating a custom InputSplit from the 
> mapreduce package which is an abstract class that DOES NOT implement Writable.
> However it's easy to miss because InputSplit from the mapred package is an 
> interface that extends Writable, and InputSplits often both extend and 
> implement both the new and old InputSplit abstract class and interface 
> (thereby becoming Writable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Run a job async

2013-01-25 Thread Jonathan Coveney
user to bcc, +dev

Cheolsoo,

Can you make a JIRA for this? I can imagine a slightly heavier test suite,
but I like where you started. If it's not far off, then I think it'll be a
win to make it thread safe. But we need to make sure to test the most
advanced features...UDF's (esp the same name but different udf in different
invocations), scripting UDFs (same thing), and so on.


2013/1/25 Cheolsoo Park 

> >> if you have multiple threads that run a query via PigServer, there is a
> great chance of the internals clashing because of the use of static
> variable within Pig.
>
> Recently, I spent some time on this, and what I found is that the Pig
> front-end is quite thread-safe. Here is how I tested it:
>
> 1) Wrote a PigUnit test that runs in MR mode.
> 2) Executed test cases concurrently in 4 threads using a JUnit extension
> called temps-fugit:
> http://tempusfugitlibrary.org/documentation/junit/parallel/
>
> After fixing PIG-3096, I was able to successfully run Pig queries in
> parallel. It's important to note that only the front-end needs to be
> thread-safe since that's what is executed in parallel.
>
> I arbitrarily selected queries from e2e test cases, so they are probably
> not complex enough to mimic real-world examples. Nevertheless, my test
> program ran without a problem for few days. I couldn't continue my
> experiment because I was pulled out into something else. However, I think
> that making the front-end thread-safe is an achievable goal.
>
> Thanks,
> Cheolsoo
>
>
>
> On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> wrote:
>
> > That clarifies it for me, thanks a lot.
> >
> > Regards,
> > Rama.
> >
> >
> > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney  > >wrote:
> >
> > > Well, when I say that Pig is not multi-threaded, what I mean is that if
> > you
> > > have multiple threads that run a query via PigServer, there is a great
> > > chance of the internals clashing because of the use of static variables
> > > within Pig. Pig itself, when running a single query, is multi-threaded.
> > > It's just not "multi-threaded" in the sense that multiple instances can
> > > safely be run in the same JVM.
> > >
> > >
> > > 2013/1/24 Ramakrishna Nalam 
> > >
> > > > Hi Jonathan,
> > > >
> > > > Pardon if it's a naive question, but Interesting that you say Pig is
> > not
> > > > multithreaded.
> > > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> > right
> > > > things to handle multi threaded requests (ThreadLocal for ScriptState
> > for
> > > > eg).
> > > >
> > > > Would be great if you can point out to the kind of issues there could
> > be.
> > > >
> > > >
> > > > Regards,
> > > > Rama.
> > > >
> > > >
> > > >
> > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M 
> > > > wrote:
> > > >
> > > > > Are there any plans on making the pigserver multi-threaded?
> > > > >
> > > > > since there is "PigProcessNotificationListener" to subscribe for
> > async
> > > > > callbacks when the pig job completes, is there any real need to
> keep
> > > the
> > > > > pig job submitting thread waiting until the job completes?
> > > > >
> > > > > Is this just a shortcoming today or are there more concrete reasons
> > > > against
> > > > > providing with a pigserver which can submit to the cluster in
> > mapreduce
> > > > > mode async?
> > > > >
> > > > > Thanks,
> > > > > Praveen
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > jcove...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I think whatever way you slice it, handling thousands of pig jobs
> > > > > > asynchronously is going to be a bear. I mean, this is essentially
> > > what
> > > > > the
> > > > > > job tracker does, albeit with a lot less information.
> > > > > >
> > > > > > Either way, Pig is not multi-threaded so having more than one
> > > instance
> > > > of
> > > > > > Pig in the same JVM is going to start causing problems (which is
> > > why, I
> > > > > > imagine, there is no async way to call Pig). So multiple
> processes
> > is
> > > > > > really the only way around it that I know of.
> > > > > >
> > > > > > At Twitter we have a deployment of mesos, and our long term
> > solution
> > > is
> > > > > > going to be running all of our pig jobs on mesos, in the short
> term
> > > by
> > > > > > deploying daemons that run pig jobs as local processes.
> > > > > >
> > > > > >
> > > > > > 2013/1/23 Prashant Kommireddi 
> > > > > >
> > > > > > > Both. Think of it as an app server handling all of these
> > requests.
> > > > > > >
> > > > > > > Sent from my iPhone
> > > > > > >
> > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> > jcove...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > > >
> > > > > > > >
> > > > > > > > 2013/1/23 Prashant Kommireddi 
> > > > > > > >
> > > > > > > >> Did not want to have several threads launched for this. We
> > might
> > > > > have
> > > > > > > >> thousands