[jira] [Updated] (PIG-3286) TestPigContext.testImportList fails in trunk

2013-05-01 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3286:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thank you Daniel and Prashant for reviewing it.

> TestPigContext.testImportList fails in trunk
> 
>
> Key: PIG-3286
> URL: https://issues.apache.org/jira/browse/PIG-3286
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.12
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>  Labels: test
> Fix For: 0.12
>
> Attachments: PIG-3286-2.patch, PIG-3286.patch
>
>
> To reproduce, run ant clean test -Dtestcase=TestPigContext. It fails with the 
> following error:
> {code}
> junit.framework.AssertionFailedError: expected:<5> but was:<6>
>   at 
> org.apache.pig.test.TestPigContext.testImportList(TestPigContext.java:157)
> {code}
> This is a regression from PIG-3198 that added "java.lang." to the default 
> import list. Here is relevant code:
> {code}
> @@ -739,6 +739,7 @@ public class PigContext implements Serializable {
>  if (packageImportList.get() == null) {
>  ArrayList importlist = new ArrayList();
>  importlist.add("");
> +importlist.add("java.lang.");
>  importlist.add("org.apache.pig.builtin.");
>  importlist.add("org.apache.pig.impl.builtin.");
>  packageImportList.set(importlist);
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Want to contribute

2013-05-01 Thread Naidu MS
Thanks Daniel.
I am new to this group.

Regards,
Naidu


On Thu, May 2, 2013 at 3:45 AM, Daniel Dai  wrote:

> Hi, Naidu,
> Those are Hadoop question and you should send to u...@hadoop.apache.org. A
> quick answer to double "hadoop namenode -format" question is, once you do
> that, you lose the metadata, you would have trouble getting hdfs files
> back.
>
> Thanks,
> Daniel
>
>
> On Wed, May 1, 2013 at 12:25 AM, Naidu MS
> wrote:
>
> > Hi i have two questions regarding hdfs and jps utility
> >
> > I am new to Hadoop and started leraning hadoop from the past week
> >
> > 1.when ever i start start-all.sh and jps in console it showing the
> > processes started
> >
> > *naidu@naidu:~/work/hadoop-1.0.4/bin$ jps*
> > *22283 NameNode*
> > *23516 TaskTracker*
> > *26711 Jps*
> > *22541 DataNode*
> > *23255 JobTracker*
> > *22813 SecondaryNameNode*
> > *Could not synchronize with target*
> >
> > But along with the list of process stared it always showing *" Could not
> > synchronize with target" *in the jps output. What is meant by "Could not
> > synchronize with target"?  Can some one explain why this is happening?
> >
> >
> > 2.Is it possible to format namenode multiple  times? When i enter the
> >  namenode -format command, it not formatting the name node and showing
> the
> > following ouput.
> >
> > *naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format*
> > *Warning: $HADOOP_HOME is deprecated.*
> > *
> > *
> > *13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: *
> > */*
> > *STARTUP_MSG: Starting NameNode*
> > *STARTUP_MSG:   host = naidu/127.0.0.1*
> > *STARTUP_MSG:   args = [-format]*
> > *STARTUP_MSG:   version = 1.0.4*
> > *STARTUP_MSG:   build =
> > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> > 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012*
> > */*
> > *Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y*
> > *Format aborted in /home/naidu/dfs/namenode*
> > *13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: *
> > */*
> > *SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1*
> > *
> > *
> > */*
> >
> > Can someone help me in understanding this? Why is it not possible to
> format
> > name node multiple times?
> >
> >
> > On Wed, May 1, 2013 at 9:47 AM, Cheolsoo Park 
> > wrote:
> >
> > > Please see the following wiki page:
> > > https://cwiki.apache.org/confluence/display/PIG/HowToContribute
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > >
> > > On Tue, Apr 30, 2013 at 9:10 PM, Naidu MS
> > > wrote:
> > >
> > > > Hi How to get the source of pig?
> > > > I am interested in going to source code so that i can learn how the
> > > > framework is written.
> > > > I can help in fixing some minor bugs/jira issues.
> > > > Can some one help me how to get the source code ?
> > > >
> > > >
> > > >
> > > > Regards,
> > > > Naidu
> > > >
> > > >
> > > > On Wed, May 1, 2013 at 9:30 AM, Cheolsoo Park 
> > > > wrote:
> > > >
> > > > > Welcome to Pig. There are hundreds of open jiras:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PIG%20AND%20status%20%3D%20Open%20ORDER%20BY%20created%20DESC%2C%20priority%20DESC
> > > > >
> > > > > Please feel free to submit patches.
> > > > >
> > > > > Thanks,
> > > > > Cheolsoo
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Apr 30, 2013 at 4:16 PM, Vineet Nair 
> > > > wrote:
> > > > >
> > > > > > Hello all ,
> > > > > >
> > > > > > I was just going through the source code of Pig and I would very
> > much
> > > > > like
> > > > > > to contribute to it.
> > > > > > I was just wondering if there are any small Jira requests that i
> > can
> > > > > start
> > > > > > working on.
> > > > > >
> > > > > > Thanks and regards,
> > > > > > Vineet
> > > > > >
> > > > >
> > > >
> > >
> >
>


Physical operators refactoring

2013-05-01 Thread Julien Le Dem
Just a heads up that I'm looking into this and that it is potentially a giant 
patch:
https://issues.apache.org/jira/browse/PIG-3307
Feedback appreciated.
Julien

[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647174#comment-13647174
 ] 

Julien Le Dem commented on PIG-3307:


It looks like we can get rid of the parameter that is only used for method 
dispatch.
I will replace all calls to getNext(Tuple t) to getNextTuple() in 
PhysicalOperator.

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_1.patch

PIG-3307_1.patch introduces some more refactoring

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A major addition to Pig. Working with spatial data

2013-05-01 Thread Ahmed Eldawy
Thanks for your response. I was never good at differentiating all those
open source licenses. I mean what is the point making open source licenses
if it blocks me from using a library in an open source project. Any way,
I'm not going into debate here. Just one question, if we use JTS as a
library (jar file) without adding the code in Pig, is it still a violation?
We'll use ivy, for example, to download the jar file when compiling.
 On May 1, 2013 7:50 PM, "Alan Gates"  wrote:

> Passing on the technical details for a moment, I see a licensing issue.
>  JTS is licensed under LGPL.  Apache projects cannot contain or ship
> [L]GPL.  Apache does not meet the requirements of GPL and thus we cannot
> repackage their code. If you wanted to go forward using that class this
> would have to be packaged as an add on that was downloaded separately and
> not from Apache.  Another option is to work with the JTS community and see
> if they are willing to dual license their code under BSD or Apache license
> so that Pig could include it.  If neither of those are an option you would
> need to come up with a new class to contain your spatial data.
>
> Alan.
>
> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:
>
> > Hi all,
> >  First, sorry for the long email. I wanted to put all my thoughts here
> and
> > get your feedback.
> >  I'm proposing a major addition to Pig that will greatly increase its
> > functionality and user base. It is simply to add spatial support to the
> > language and the framework. I've already started working on that but I
> > don't want it to be just another branch. I want it, eventually, to be
> > merged with the trunk of Apache Pig. So, I'm sending this email mainly to
> > reach out the main contributors of Pig to see the feasibility of this.
> > This addition is a part of a big project we have been working on in
> > University of Minnesota; the project is called Spatial Hadoop.
> > http://spatialhadoop.cs.umn.edu. It's about building a MapReduce
> framework
> > (Hadoop) that is capable of maintaining and analyzing spatial data
> > efficiently. I'm the main guy behind that project and since we released
> its
> > first version, we received very encouraging responses from different
> groups
> > in the research and industrial community. I'm sure the addition we want
> to
> > make to Pig Latin will be widely accepted by the people in the spatial
> > community.
> > I'm proposing a plan here while we're still in the early phases of this
> > task to be able to discuss it with the main contributors and see its
> > feasibility. First of all, I think that we need to change the core of Pig
> > to be able to support spatial data. Providing a set of UDFs only is not
> > enough. The main reason is that Pig Latin does not provide a way to
> create
> > a new data type which is needed for spatial data. Once we have the
> spatial
> > data types we need, the functionality can be expanded using more UDFs.
> >
> > Here's the plan as I see it.
> > 1- Introduce a new primitive data type Geometry which represents all
> > spatial data types. In the underlying system, this will map to
> > com.vividsolutions.jts.geom.Geometry. This is a class from Java Topology
> > Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a stable
> and
> > efficient open source Java library for spatial data types and algorithms.
> > It is very popular in the spatial community and a C++ port of it is used
> in
> > PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS also
> > conforms with Open Geospatial Consortium (OGC) [
> > http://www.opengeospatial.org/] which is an open standard for the
> spatial
> > data types. The Geometry data type is read from and written to text files
> > using the Well Known Text (WKT) format. There is also a way to convert it
> > to/from binary so that it can work with binary files and streams.
> > 2- Add functions that manipulate spatial data types. These will be added
> as
> > UDFs and we will not need to mess with the internals of Pig. Most
> probably,
> > there will be one new class for each operation (e.g., union or
> > intersection). I think it will be good to put these new operations inside
> > the core of Pig so that users can use it without having to write the
> fully
> > qualified class name. Also, since there is no way to implicitly cast a
> > spatial data type to a non-spatial data types, there will not be any
> > conflicts in existing operations or new operations. All new operations,
> and
> > only the new operations, will be working on spatial data types. Here is
> an
> > initial list of operations that can be added. All those operations are
> > already implemented in JTS and the UDFs added to Pig will be just
> wrappers
> > around them.
> > **Predicates (used for spatial filtering)
> > Equals
> > Disjoint
> > Intersects
> > Touches
> > Crosses
> > Within
> > Contains
> > Overlaps
> >
> > **Operations
> > Envelope
> > Area
> > Length
> > Buffer
> > ConvexHull
> > Intersection
> > Un

[jira] Subscription: PIG patch available

2013-05-01 Thread jira
Issue Subscription
Filter: PIG patch available (29 issues)

Subscriber: pigdaily

Key Summary
PIG-3297Avro files with stringType set to String cannot be read by the 
AvroStorage LoadFunc
https://issues.apache.org/jira/browse/PIG-3297
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3291TestExampleGenerator fails on Windows because of lack of file name 
escaping
https://issues.apache.org/jira/browse/PIG-3291
PIG-3288Kill jobs if the number of output files is over a configurable limit
https://issues.apache.org/jira/browse/PIG-3288
PIG-3286TestPigContext.testImportList fails in trunk
https://issues.apache.org/jira/browse/PIG-3286
PIG-3285Jobs using HBaseStorage fail to ship dependency jars
https://issues.apache.org/jira/browse/PIG-3285
PIG-3281Pig version in pig.pom is incorrect in branch-0.11
https://issues.apache.org/jira/browse/PIG-3281
PIG-3258Patch to allow MultiStorage to use more than one index to generate 
output tree
https://issues.apache.org/jira/browse/PIG-3258
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3223AvroStorage does not handle comma separated input paths
https://issues.apache.org/jira/browse/PIG-3223
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3097HiveColumnarLoader doesn't correctly load partitioned Hive table 
https://issues.apache.org/jira/browse/PIG-3097
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-2244Macros cannot be passed relation names
https://issues.apache.org/jira/browse/PIG-2244
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


Re: A major addition to Pig. Working with spatial data

2013-05-01 Thread Alan Gates
Passing on the technical details for a moment, I see a licensing issue.  JTS is 
licensed under LGPL.  Apache projects cannot contain or ship [L]GPL.  Apache 
does not meet the requirements of GPL and thus we cannot repackage their code. 
If you wanted to go forward using that class this would have to be packaged as 
an add on that was downloaded separately and not from Apache.  Another option 
is to work with the JTS community and see if they are willing to dual license 
their code under BSD or Apache license so that Pig could include it.  If 
neither of those are an option you would need to come up with a new class to 
contain your spatial data.

Alan.

On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:

> Hi all,
>  First, sorry for the long email. I wanted to put all my thoughts here and
> get your feedback.
>  I'm proposing a major addition to Pig that will greatly increase its
> functionality and user base. It is simply to add spatial support to the
> language and the framework. I've already started working on that but I
> don't want it to be just another branch. I want it, eventually, to be
> merged with the trunk of Apache Pig. So, I'm sending this email mainly to
> reach out the main contributors of Pig to see the feasibility of this.
> This addition is a part of a big project we have been working on in
> University of Minnesota; the project is called Spatial Hadoop.
> http://spatialhadoop.cs.umn.edu. It's about building a MapReduce framework
> (Hadoop) that is capable of maintaining and analyzing spatial data
> efficiently. I'm the main guy behind that project and since we released its
> first version, we received very encouraging responses from different groups
> in the research and industrial community. I'm sure the addition we want to
> make to Pig Latin will be widely accepted by the people in the spatial
> community.
> I'm proposing a plan here while we're still in the early phases of this
> task to be able to discuss it with the main contributors and see its
> feasibility. First of all, I think that we need to change the core of Pig
> to be able to support spatial data. Providing a set of UDFs only is not
> enough. The main reason is that Pig Latin does not provide a way to create
> a new data type which is needed for spatial data. Once we have the spatial
> data types we need, the functionality can be expanded using more UDFs.
> 
> Here's the plan as I see it.
> 1- Introduce a new primitive data type Geometry which represents all
> spatial data types. In the underlying system, this will map to
> com.vividsolutions.jts.geom.Geometry. This is a class from Java Topology
> Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a stable and
> efficient open source Java library for spatial data types and algorithms.
> It is very popular in the spatial community and a C++ port of it is used in
> PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS also
> conforms with Open Geospatial Consortium (OGC) [
> http://www.opengeospatial.org/] which is an open standard for the spatial
> data types. The Geometry data type is read from and written to text files
> using the Well Known Text (WKT) format. There is also a way to convert it
> to/from binary so that it can work with binary files and streams.
> 2- Add functions that manipulate spatial data types. These will be added as
> UDFs and we will not need to mess with the internals of Pig. Most probably,
> there will be one new class for each operation (e.g., union or
> intersection). I think it will be good to put these new operations inside
> the core of Pig so that users can use it without having to write the fully
> qualified class name. Also, since there is no way to implicitly cast a
> spatial data type to a non-spatial data types, there will not be any
> conflicts in existing operations or new operations. All new operations, and
> only the new operations, will be working on spatial data types. Here is an
> initial list of operations that can be added. All those operations are
> already implemented in JTS and the UDFs added to Pig will be just wrappers
> around them.
> **Predicates (used for spatial filtering)
> Equals
> Disjoint
> Intersects
> Touches
> Crosses
> Within
> Contains
> Overlaps
> 
> **Operations
> Envelope
> Area
> Length
> Buffer
> ConvexHull
> Intersection
> Union
> Difference
> SymDifference
> 
> **Aggregate functions
> Accum
> ConvexHull
> Union
> 
> 3- The third step is to implement spatial indexes (e.g., Grid or R-tree). A
> Pig loader and Pig output classes will be created for those indexes. Note
> that currently we have SpatialOutputFormat and SpatialInputFormat for those
> indexes inside the Spatial Hadoop project, but we need to tweak them to
> work with Pig.
> 
> 4- (Advanced) Implement more sophisticated algorithms for spatial
> operations that utilize the indexes. For example, we can have a specific
> algorithm for spatial range query or spatial join. Again, we already have
> algorithms built for differe

[jira] [Updated] (PIG-2970) Nested foreach getting incorrect schema when having unrelated inner query

2013-05-01 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2970:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to trunk.

> Nested foreach getting incorrect schema when having unrelated inner query
> -
>
> Key: PIG-2970
> URL: https://issues.apache.org/jira/browse/PIG-2970
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.10.0
>Reporter: Koji Noguchi
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.12
>
> Attachments: PIG-2970-0.patch, PIG-2970-1.patch, PIG-2970-2.patch, 
> pig-2970-trunk-v01.txt, pig-2970-trunk-v02.txt
>
>
> While looking at PIG-2968, hit a weird error message.
> {noformat}
> $ cat -n test/foreach2.pig
>  1  daily = load 'nyse' as (exchange, symbol);
>  2  grpd = group daily by exchange;
>  3  unique = foreach grpd {
>  4  sym = daily.symbol;
>  5  uniq_sym = distinct sym;
>  6  --ignoring uniq_sym result
>  7  generate group, daily;
>  8  };
>  9  describe unique;
> 10  zzz = foreach unique generate group;
> 11  explain zzz;
> % pig -x local -t ColumnMapKeyPrune test/foreach2.pig
> ...
> unique: {symbol: bytearray}
> 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1025: 
>  Invalid field projection. 
> Projected field [group] does not exist in schema: symbol:bytearray.
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


A major addition to Pig. Working with spatial data

2013-05-01 Thread Ahmed Eldawy
Hi all,
  First, sorry for the long email. I wanted to put all my thoughts here and
get your feedback.
  I'm proposing a major addition to Pig that will greatly increase its
functionality and user base. It is simply to add spatial support to the
language and the framework. I've already started working on that but I
don't want it to be just another branch. I want it, eventually, to be
merged with the trunk of Apache Pig. So, I'm sending this email mainly to
reach out the main contributors of Pig to see the feasibility of this.
 This addition is a part of a big project we have been working on in
University of Minnesota; the project is called Spatial Hadoop.
http://spatialhadoop.cs.umn.edu. It's about building a MapReduce framework
(Hadoop) that is capable of maintaining and analyzing spatial data
efficiently. I'm the main guy behind that project and since we released its
first version, we received very encouraging responses from different groups
in the research and industrial community. I'm sure the addition we want to
make to Pig Latin will be widely accepted by the people in the spatial
community.
 I'm proposing a plan here while we're still in the early phases of this
task to be able to discuss it with the main contributors and see its
feasibility. First of all, I think that we need to change the core of Pig
to be able to support spatial data. Providing a set of UDFs only is not
enough. The main reason is that Pig Latin does not provide a way to create
a new data type which is needed for spatial data. Once we have the spatial
data types we need, the functionality can be expanded using more UDFs.

Here's the plan as I see it.
1- Introduce a new primitive data type Geometry which represents all
spatial data types. In the underlying system, this will map to
com.vividsolutions.jts.geom.Geometry. This is a class from Java Topology
Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a stable and
efficient open source Java library for spatial data types and algorithms.
It is very popular in the spatial community and a C++ port of it is used in
PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS also
conforms with Open Geospatial Consortium (OGC) [
http://www.opengeospatial.org/] which is an open standard for the spatial
data types. The Geometry data type is read from and written to text files
using the Well Known Text (WKT) format. There is also a way to convert it
to/from binary so that it can work with binary files and streams.
2- Add functions that manipulate spatial data types. These will be added as
UDFs and we will not need to mess with the internals of Pig. Most probably,
there will be one new class for each operation (e.g., union or
intersection). I think it will be good to put these new operations inside
the core of Pig so that users can use it without having to write the fully
qualified class name. Also, since there is no way to implicitly cast a
spatial data type to a non-spatial data types, there will not be any
conflicts in existing operations or new operations. All new operations, and
only the new operations, will be working on spatial data types. Here is an
initial list of operations that can be added. All those operations are
already implemented in JTS and the UDFs added to Pig will be just wrappers
around them.
**Predicates (used for spatial filtering)
Equals
Disjoint
Intersects
Touches
Crosses
Within
Contains
Overlaps

**Operations
Envelope
Area
Length
Buffer
ConvexHull
Intersection
Union
Difference
SymDifference

**Aggregate functions
Accum
ConvexHull
Union

3- The third step is to implement spatial indexes (e.g., Grid or R-tree). A
Pig loader and Pig output classes will be created for those indexes. Note
that currently we have SpatialOutputFormat and SpatialInputFormat for those
indexes inside the Spatial Hadoop project, but we need to tweak them to
work with Pig.

4- (Advanced) Implement more sophisticated algorithms for spatial
operations that utilize the indexes. For example, we can have a specific
algorithm for spatial range query or spatial join. Again, we already have
algorithms built for different operations implemented in Spatial Hadoop as
MapReduce programs, but they will need to be modified to work in Pig
environment and get to work with other operations.

This is my whole plan for the spatial extension to Pig. I've already
started with the first step but as I mentioned earlier, I don't want to do
the work for our project and then the work gets forgotten. I want to
contribute to Pig and do my research at the same time. If you think the
plan is plausible, I'll open JIRA issues for the above tasks and start
shipping patches to do the stuff. I'll conform with the standards of the
project such as adding tests and well commenting the code.
Sorry for the long email and hope to hear back from you.


Best regards,
Ahmed Eldawy


[jira] [Commented] (PIG-3286) TestPigContext.testImportList fails in trunk

2013-05-01 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647138#comment-13647138
 ] 

Daniel Dai commented on PIG-3286:
-

+1

> TestPigContext.testImportList fails in trunk
> 
>
> Key: PIG-3286
> URL: https://issues.apache.org/jira/browse/PIG-3286
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.12
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>  Labels: test
> Fix For: 0.12
>
> Attachments: PIG-3286-2.patch, PIG-3286.patch
>
>
> To reproduce, run ant clean test -Dtestcase=TestPigContext. It fails with the 
> following error:
> {code}
> junit.framework.AssertionFailedError: expected:<5> but was:<6>
>   at 
> org.apache.pig.test.TestPigContext.testImportList(TestPigContext.java:157)
> {code}
> This is a regression from PIG-3198 that added "java.lang." to the default 
> import list. Here is relevant code:
> {code}
> @@ -739,6 +739,7 @@ public class PigContext implements Serializable {
>  if (packageImportList.get() == null) {
>  ArrayList importlist = new ArrayList();
>  importlist.add("");
> +importlist.add("java.lang.");
>  importlist.add("org.apache.pig.builtin.");
>  importlist.add("org.apache.pig.impl.builtin.");
>  packageImportList.set(importlist);
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3305) Infinite loop when input path contains empty partition directory

2013-05-01 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647134#comment-13647134
 ] 

Daniel Dai commented on PIG-3305:
-

[~maczech]Can you make a patch for trunk?

> Infinite loop when input path contains empty partition directory 
> -
>
> Key: PIG-3305
> URL: https://issues.apache.org/jira/browse/PIG-3305
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.1
>Reporter: Marcin Czech
>Priority: Critical
> Fix For: 0.10.1
>
> Attachments: PIG-3305.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3304) XMLLoader in piggybank does not work with inline closed tags

2013-05-01 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3304.
-

   Resolution: Fixed
Fix Version/s: 0.12
 Assignee: Ahmed Eldawy
 Hadoop Flags: Reviewed

Piggybank tests pass. Patch commmitted to trunk. Thanks Ahmed!

> XMLLoader in piggybank does not work with inline closed tags
> 
>
> Key: PIG-3304
> URL: https://issues.apache.org/jira/browse/PIG-3304
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Ahmed Eldawy
>Assignee: Ahmed Eldawy
>  Labels: patch
> Fix For: 0.12
>
> Attachments: xmlloader_inline_close_tag_1.patch, 
> xmlloader_inline_close_tag.patch
>
>
> The XMLLoader fails to return elements when tags are closed inline such as
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table

2013-05-01 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647027#comment-13647027
 ] 

Daniel Dai commented on PIG-3097:
-

The patch applies to trunk and all piggybank tests pass. Is it Ok to committed 
to trunk?

> HiveColumnarLoader doesn't correctly load partitioned Hive table 
> -
>
> Key: PIG-3097
> URL: https://issues.apache.org/jira/browse/PIG-3097
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.1
>Reporter: Richard Ding
>Assignee: Richard Ding
>  Labels: patch
> Attachments: PIG-3097.patch
>
>
> Given a partitioned Hive table:
> {code}
> hive> describe mytable;
> OK
> f1string  
> f2 string  
> f3 string  
> partition_dtstring
> {code}
> The following Pig script gives the correct schema:
> {code}
> grunt> A = load '/hive/warehouse/mytable' using 
> org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 
> string');
> grunt> describe A
> A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray}
> {code}
> But, the command
> {code}
> grunt> dump A
> {code}
> only produces the first column of all records in the table (all four columns 
> are expected).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2586) A better plan/data flow visualizer

2013-05-01 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647006#comment-13647006
 ] 

Daniel Dai commented on PIG-2586:
-

Thanks, I will take a look.

> A better plan/data flow visualizer
> --
>
> Key: PIG-2586
> URL: https://issues.apache.org/jira/browse/PIG-2586
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Daniel Dai
>  Labels: gsoc2013
>
> Pig supports a dot graph style plan to visualize the 
> logical/physical/mapreduce plan (explain with -dot option, see 
> http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html). 
> However, dot graph takes extra step to generate the plan graph and the 
> quality of the output is not good. It's better we can implement a better 
> visualizer for Pig. It should:
> 1. show operator type and alias
> 2. turn on/off output schema
> 3. dive into foreach inner plan on demand
> 4. provide a way to show operator source code, eg, tooltip of an operator 
> (plan don't currently have this information, but you can assume this is in 
> place)
> 5. besides visualize logical/physical/mapreduce plan, visualize the script 
> itself is also useful
> 6. may rely on some java graphic library such as Swing
> This is a candidate project for Google summer of code 2013. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2013

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Want to contribute

2013-05-01 Thread Daniel Dai
Hi, Naidu,
Those are Hadoop question and you should send to u...@hadoop.apache.org. A
quick answer to double "hadoop namenode -format" question is, once you do
that, you lose the metadata, you would have trouble getting hdfs files back.

Thanks,
Daniel


On Wed, May 1, 2013 at 12:25 AM, Naidu MS
wrote:

> Hi i have two questions regarding hdfs and jps utility
>
> I am new to Hadoop and started leraning hadoop from the past week
>
> 1.when ever i start start-all.sh and jps in console it showing the
> processes started
>
> *naidu@naidu:~/work/hadoop-1.0.4/bin$ jps*
> *22283 NameNode*
> *23516 TaskTracker*
> *26711 Jps*
> *22541 DataNode*
> *23255 JobTracker*
> *22813 SecondaryNameNode*
> *Could not synchronize with target*
>
> But along with the list of process stared it always showing *" Could not
> synchronize with target" *in the jps output. What is meant by "Could not
> synchronize with target"?  Can some one explain why this is happening?
>
>
> 2.Is it possible to format namenode multiple  times? When i enter the
>  namenode -format command, it not formatting the name node and showing the
> following ouput.
>
> *naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format*
> *Warning: $HADOOP_HOME is deprecated.*
> *
> *
> *13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: *
> */*
> *STARTUP_MSG: Starting NameNode*
> *STARTUP_MSG:   host = naidu/127.0.0.1*
> *STARTUP_MSG:   args = [-format]*
> *STARTUP_MSG:   version = 1.0.4*
> *STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012*
> */*
> *Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y*
> *Format aborted in /home/naidu/dfs/namenode*
> *13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: *
> */*
> *SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1*
> *
> *
> */*
>
> Can someone help me in understanding this? Why is it not possible to format
> name node multiple times?
>
>
> On Wed, May 1, 2013 at 9:47 AM, Cheolsoo Park 
> wrote:
>
> > Please see the following wiki page:
> > https://cwiki.apache.org/confluence/display/PIG/HowToContribute
> >
> > Thanks,
> > Cheolsoo
> >
> >
> > On Tue, Apr 30, 2013 at 9:10 PM, Naidu MS
> > wrote:
> >
> > > Hi How to get the source of pig?
> > > I am interested in going to source code so that i can learn how the
> > > framework is written.
> > > I can help in fixing some minor bugs/jira issues.
> > > Can some one help me how to get the source code ?
> > >
> > >
> > >
> > > Regards,
> > > Naidu
> > >
> > >
> > > On Wed, May 1, 2013 at 9:30 AM, Cheolsoo Park 
> > > wrote:
> > >
> > > > Welcome to Pig. There are hundreds of open jiras:
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PIG%20AND%20status%20%3D%20Open%20ORDER%20BY%20created%20DESC%2C%20priority%20DESC
> > > >
> > > > Please feel free to submit patches.
> > > >
> > > > Thanks,
> > > > Cheolsoo
> > > >
> > > >
> > > >
> > > > On Tue, Apr 30, 2013 at 4:16 PM, Vineet Nair 
> > > wrote:
> > > >
> > > > > Hello all ,
> > > > >
> > > > > I was just going through the source code of Pig and I would very
> much
> > > > like
> > > > > to contribute to it.
> > > > > I was just wondering if there are any small Jira requests that i
> can
> > > > start
> > > > > working on.
> > > > >
> > > > > Thanks and regards,
> > > > > Vineet
> > > > >
> > > >
> > >
> >
>


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-05-01 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646998#comment-13646998
 ] 

Daniel Dai commented on PIG-3285:
-

Agree with Rohini, that should be a simple fix to add protobuf.jar. We don't 
need to double ship jars and make things complicated.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Description: 
The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just used to pick the correct method.
I have started a refactoring for a more readable getNext*T*().


  was:
The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just use to pick the correct method.
I have started a refactoring for a more readable getNext*T*().



> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_0.patch

PIG-3307_0.patch contains the initial refactoring

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just use to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3307:
--

 Summary: Refactor physical operators to remove methods parameters 
that are always null
 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch

The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just use to pick the correct method.
I have started a refactoring for a more readable getNext*T*().


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3306) Publish h2 artifact to maven

2013-05-01 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham resolved PIG-3306.
--

Resolution: Not A Problem

Yup [~rohini] you're right we already do that.

I should have known, I've published the last two releases. :)

> Publish h2 artifact to maven
> 
>
> Key: PIG-3306
> URL: https://issues.apache.org/jira/browse/PIG-3306
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>
> The Pig artifact built with hadoopversion=23 should be published to maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3303.


   Resolution: Fixed
Fix Version/s: 0.12

Merged in trunk

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.12
>
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3306) Publish h2 artifact to maven

2013-05-01 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646796#comment-13646796
 ] 

Rohini Palaniswamy commented on PIG-3306:
-

Is this jira for ivy-publish-local ?

> Publish h2 artifact to maven
> 
>
> Key: PIG-3306
> URL: https://issues.apache.org/jira/browse/PIG-3306
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>
> The Pig artifact built with hadoopversion=23 should be published to maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3306) Publish h2 artifact to maven

2013-05-01 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646784#comment-13646784
 ] 

Rohini Palaniswamy commented on PIG-3306:
-

Isn't it already published? 
http://repo1.maven.org/maven2/org/apache/pig/pig/0.10.1/ and 
http://repo1.maven.org/maven2/org/apache/pig/pig/0.11.1/ have those jars. 

> Publish h2 artifact to maven
> 
>
> Key: PIG-3306
> URL: https://issues.apache.org/jira/browse/PIG-3306
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>
> The Pig artifact built with hadoopversion=23 should be published to maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-05-01 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646720#comment-13646720
 ] 

Bill Graham commented on PIG-3303:
--

+1

Created PIG-3306 for publishing the h2 artifact to maven. 

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-05-01 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3303:
-

Assignee: Julien Le Dem

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3306) Publish h2 artifact to maven

2013-05-01 Thread Bill Graham (JIRA)
Bill Graham created PIG-3306:


 Summary: Publish h2 artifact to maven
 Key: PIG-3306
 URL: https://issues.apache.org/jira/browse/PIG-3306
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham


The Pig artifact built with hadoopversion=23 should be published to maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-05-01 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646715#comment-13646715
 ] 

Rohini Palaniswamy commented on PIG-3285:
-

Actually I got a little confused with the patch as 
TableMapReduce.addDependencyJars(job) was adding all those and focused just on 
Daniel's comment. If your intention is to only add protobuf jar, you can do a 
Class.forName(some protobuf class name) and if that does not throw a CNFE 
(meaning older hbase versions) you can pass that class also to the 
TableMapreduceUtil.addDependencyJars(Configuration conf, Class... classes)

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-05-01 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646710#comment-13646710
 ] 

Rohini Palaniswamy commented on PIG-3303:
-

+1

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3305) Infinite loop when input path contains empty partition directory

2013-05-01 Thread Marcin Czech (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Czech updated PIG-3305:
--

Attachment: PIG-3305.patch

> Infinite loop when input path contains empty partition directory 
> -
>
> Key: PIG-3305
> URL: https://issues.apache.org/jira/browse/PIG-3305
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.1
>Reporter: Marcin Czech
>Priority: Critical
> Fix For: 0.10.1
>
> Attachments: PIG-3305.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3305) Infinite loop when input path contains empty partition directory

2013-05-01 Thread Marcin Czech (JIRA)
Marcin Czech created PIG-3305:
-

 Summary: Infinite loop when input path contains empty partition 
directory 
 Key: PIG-3305
 URL: https://issues.apache.org/jira/browse/PIG-3305
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Marcin Czech
Priority: Critical
 Fix For: 0.10.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table

2013-05-01 Thread Marcin Czech (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Czech updated PIG-3097:
--

Attachment: PIG-3097.patch

> HiveColumnarLoader doesn't correctly load partitioned Hive table 
> -
>
> Key: PIG-3097
> URL: https://issues.apache.org/jira/browse/PIG-3097
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.1
>Reporter: Richard Ding
>Assignee: Richard Ding
>  Labels: patch
> Attachments: PIG-3097.patch
>
>
> Given a partitioned Hive table:
> {code}
> hive> describe mytable;
> OK
> f1string  
> f2 string  
> f3 string  
> partition_dtstring
> {code}
> The following Pig script gives the correct schema:
> {code}
> grunt> A = load '/hive/warehouse/mytable' using 
> org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 
> string');
> grunt> describe A
> A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray}
> {code}
> But, the command
> {code}
> grunt> dump A
> {code}
> only produces the first column of all records in the table (all four columns 
> are expected).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table

2013-05-01 Thread Marcin Czech (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Czech updated PIG-3097:
--

   Labels: patch  (was: )
Affects Version/s: 0.10.1
   Status: Patch Available  (was: Open)

We are using 0.10.1 version, so this fix is for this version. The fix is 
extremely simple so it should be easy to put it to trunk.

> HiveColumnarLoader doesn't correctly load partitioned Hive table 
> -
>
> Key: PIG-3097
> URL: https://issues.apache.org/jira/browse/PIG-3097
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.1
>Reporter: Richard Ding
>Assignee: Richard Ding
>  Labels: patch
>
> Given a partitioned Hive table:
> {code}
> hive> describe mytable;
> OK
> f1string  
> f2 string  
> f3 string  
> partition_dtstring
> {code}
> The following Pig script gives the correct schema:
> {code}
> grunt> A = load '/hive/warehouse/mytable' using 
> org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 
> string');
> grunt> describe A
> A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray}
> {code}
> But, the command
> {code}
> grunt> dump A
> {code}
> only produces the first column of all records in the table (all four columns 
> are expected).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Want to contribute

2013-05-01 Thread Naidu MS
Hi i have two questions regarding hdfs and jps utility

I am new to Hadoop and started leraning hadoop from the past week

1.when ever i start start-all.sh and jps in console it showing the
processes started

*naidu@naidu:~/work/hadoop-1.0.4/bin$ jps*
*22283 NameNode*
*23516 TaskTracker*
*26711 Jps*
*22541 DataNode*
*23255 JobTracker*
*22813 SecondaryNameNode*
*Could not synchronize with target*

But along with the list of process stared it always showing *" Could not
synchronize with target" *in the jps output. What is meant by "Could not
synchronize with target"?  Can some one explain why this is happening?


2.Is it possible to format namenode multiple  times? When i enter the
 namenode -format command, it not formatting the name node and showing the
following ouput.

*naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format*
*Warning: $HADOOP_HOME is deprecated.*
*
*
*13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: *
*/*
*STARTUP_MSG: Starting NameNode*
*STARTUP_MSG:   host = naidu/127.0.0.1*
*STARTUP_MSG:   args = [-format]*
*STARTUP_MSG:   version = 1.0.4*
*STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012*
*/*
*Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y*
*Format aborted in /home/naidu/dfs/namenode*
*13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: *
*/*
*SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1*
*
*
*/*

Can someone help me in understanding this? Why is it not possible to format
name node multiple times?


On Wed, May 1, 2013 at 9:47 AM, Cheolsoo Park  wrote:

> Please see the following wiki page:
> https://cwiki.apache.org/confluence/display/PIG/HowToContribute
>
> Thanks,
> Cheolsoo
>
>
> On Tue, Apr 30, 2013 at 9:10 PM, Naidu MS
> wrote:
>
> > Hi How to get the source of pig?
> > I am interested in going to source code so that i can learn how the
> > framework is written.
> > I can help in fixing some minor bugs/jira issues.
> > Can some one help me how to get the source code ?
> >
> >
> >
> > Regards,
> > Naidu
> >
> >
> > On Wed, May 1, 2013 at 9:30 AM, Cheolsoo Park 
> > wrote:
> >
> > > Welcome to Pig. There are hundreds of open jiras:
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20PIG%20AND%20status%20%3D%20Open%20ORDER%20BY%20created%20DESC%2C%20priority%20DESC
> > >
> > > Please feel free to submit patches.
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > >
> > >
> > > On Tue, Apr 30, 2013 at 4:16 PM, Vineet Nair 
> > wrote:
> > >
> > > > Hello all ,
> > > >
> > > > I was just going through the source code of Pig and I would very much
> > > like
> > > > to contribute to it.
> > > > I was just wondering if there are any small Jira requests that i can
> > > start
> > > > working on.
> > > >
> > > > Thanks and regards,
> > > > Vineet
> > > >
> > >
> >
>