[jira] [Commented] (PIG-1748) Add load/store function AvroStorage for avro data

2012-09-13 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454985#comment-13454985
 ] 

Jakob Homan commented on PIG-1748:
--

@deb - questions like these should be directed to the pig user list, not JIRA.  
You'll receive assistance there.

 Add load/store function AvroStorage for avro data
 -

 Key: PIG-1748
 URL: https://issues.apache.org/jira/browse/PIG-1748
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: lin guo
Assignee: lin guo
 Fix For: 0.9.0

 Attachments: avro_storage.patch, AvroStorageUtils-bagfix.patch, 
 avro_test_files.tar.gz, PIG-1748-2.patch, PIG-1748-3.patch


 We want to use Pig to process arbitrary Avro data and store results as Avro 
 files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
 Due to discrepancies of Avro and Pig data models, AvroStorage has:
 1. Limited support for record: we do not support recursively defined record 
 because the number of fields in such records is data dependent.
 2. Limited support for union: we only accept nullable union like [null, 
 some-type].
 For simplicity, we also make the following assumptions:
 If the input directory is a leaf directory, then we assume Avro data files in 
 it have the same schema;
 If the input directory contains sub-directories, then we assume Avro data 
 files in all sub-directories have the same schema.
 AvroStorage takes no input parameters when used as a LoadFunc (except for 
 debug [debug-level]). 
 Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
 don't, Avro schema of output data is derived from its 
 Pig schema.
 Detailed documentation can be found in 
 http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2012-08-06 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429456#comment-13429456
 ] 

Jakob Homan commented on PIG-1891:
--

This looks good to me.  +1 on the patch, for what it's worth.  This is what 
we're looking for.  [~billgraham], how does this look to you?

 Enable StoreFunc to make intelligent decision based on job success or failure
 -

 Key: PIG-1891
 URL: https://issues.apache.org/jira/browse/PIG-1891
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.10.0
Reporter: Alex Rovner
Priority: Minor
  Labels: patch
 Attachments: PIG-1891-1.patch


 We are in the process of using PIG for various data processing and component 
 integration. Here is where we feel pig storage funcs lack:
 They are not aware if the over all job has succeeded. This creates a problem 
 for storage funcs which needs to upload results into another system:
 DB, FTP, another file system etc.
 I looked at the DBStorage in the piggybank 
 (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
  and what I see is essentially a mechanism which for each task does the 
 following:
 1. Creates a recordwriter (in this case open connection to db)
 2. Open transaction.
 3. Writes records into a batch
 4. Executes commit or rollback depending if the task was successful.
 While this aproach works great on a task level, it does not work at all on a 
 job level. 
 If certain tasks will succeed but over job will fail, partial records are 
 going to get uploaded into the DB.
 Any ideas on the workaround? 
 Our current workaround is fairly ugly: We created a java wrapper that 
 launches pig jobs and then uploads to DB's once pig's job is successful. 
 While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2031) NPE in TOP

2011-08-30 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-2031:
-

Assignee: Jacob Perkins

 NPE in TOP
 --

 Key: PIG-2031
 URL: https://issues.apache.org/jira/browse/PIG-2031
 Project: Pig
  Issue Type: Bug
Reporter: Jacob Perkins
Assignee: Jacob Perkins
  Labels: newbie
 Attachments: toppatch.txt


 If a NULL DataBag is passed to org.apache.pig.builtin.TOP then a NPE is 
 thrown. Consider:
 {code}
 $: cat foo.tsv
 a  {(foo,1),(bar,2)}
 b
 c  {(fyha,4),(asdf,9)}
 {code}
 then:
 {code}
 data  = LOAD 'foo.tsv' AS (key:chararray, a_bag:bag {t:tuple (name:chararray, 
 value:int)});
 tpd   = FOREACH data {
   top_n = TOP(1, 1, a_bag);
   GENERATE
 key   AS key,
 top_n AS top_n
   ; 
 };
 DUMP tpd;
 {code}
 will throw an NPE when it gets to the row with no bag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1748) Add load/store function AvroStorage for avro data

2011-07-05 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1748:
-

Assignee: lin guo  (was: Jakob Homan)

 Add load/store function AvroStorage for avro data
 -

 Key: PIG-1748
 URL: https://issues.apache.org/jira/browse/PIG-1748
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: lin guo
Assignee: lin guo
 Fix For: 0.9.0

 Attachments: PIG-1748-2.patch, PIG-1748-3.patch, avro_storage.patch, 
 avro_test_files.tar.gz


 We want to use Pig to process arbitrary Avro data and store results as Avro 
 files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
 Due to discrepancies of Avro and Pig data models, AvroStorage has:
 1. Limited support for record: we do not support recursively defined record 
 because the number of fields in such records is data dependent.
 2. Limited support for union: we only accept nullable union like [null, 
 some-type].
 For simplicity, we also make the following assumptions:
 If the input directory is a leaf directory, then we assume Avro data files in 
 it have the same schema;
 If the input directory contains sub-directories, then we assume Avro data 
 files in all sub-directories have the same schema.
 AvroStorage takes no input parameters when used as a LoadFunc (except for 
 debug [debug-level]). 
 Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
 don't, Avro schema of output data is derived from its 
 Pig schema.
 Detailed documentation can be found in 
 http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-05-09 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030845#comment-13030845
 ] 

Jakob Homan commented on PIG-1890:
--

@Ken - any update now that we're in a new week?

 Fix piggybank unit test TestAvroStorage
 ---

 Key: PIG-1890
 URL: https://issues.apache.org/jira/browse/PIG-1890
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Daniel Dai
Assignee: Jakob Homan
 Fix For: 0.9.0

 Attachments: PIG-1890-1.patch


 TestAvroStorage fail on trunk. There are two reasons:
 1. After PIG-1680, we call LoadFunc.setLocation one more time.
 2. The schema for AvroStorage seems to be wrong. For example, in first test 
 case testArrayDefault, the schema for in is set to PIG_WRAPPER: (FIELD: 
 {PIG_WRAPPER: (ARRAY_ELEM: float)}). It seems PIG_WRAPPER is redundant. This 
 issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1872) Fix bug in AvroStorage

2011-02-25 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12999689#comment-12999689
 ] 

Jakob Homan commented on PIG-1872:
--

+1.  Looks good to me.

 Fix bug in AvroStorage
 --

 Key: PIG-1872
 URL: https://issues.apache.org/jira/browse/PIG-1872
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: lin guo
Priority: Minor
 Fix For: 0.9.0

 Attachments: my.diff


 AvroStorageUtils.containsRecursiveRecord() has a bug and returns true for a 
 record with multiple fields of the same type, e.g.
  { type:record, name:Event,  +
 fields:[{name:f1, type:{ type:record,name:EntityID, }}
   {name:f2,type:EntityID},  +
   {name:f3,type:EntityID} ]}

 Patch contains bug fix and unit tests.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1872) Fix bug in AvroStorage

2011-02-25 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1872:
-

Fix Version/s: (was: 0.9.0)
   site
Affects Version/s: (was: 0.9.0)
   site
   Status: Patch Available  (was: Open)

 Fix bug in AvroStorage
 --

 Key: PIG-1872
 URL: https://issues.apache.org/jira/browse/PIG-1872
 Project: Pig
  Issue Type: Bug
Affects Versions: site
Reporter: lin guo
Priority: Minor
 Fix For: site

 Attachments: my.diff


 AvroStorageUtils.containsRecursiveRecord() has a bug and returns true for a 
 record with multiple fields of the same type, e.g.
  { type:record, name:Event,  +
 fields:[{name:f1, type:{ type:record,name:EntityID, }}
   {name:f2,type:EntityID},  +
   {name:f3,type:EntityID} ]}

 Patch contains bug fix and unit tests.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1872) Fix bug in AvroStorage

2011-02-25 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1872:
-

Priority: Major  (was: Minor)

 Fix bug in AvroStorage
 --

 Key: PIG-1872
 URL: https://issues.apache.org/jira/browse/PIG-1872
 Project: Pig
  Issue Type: Bug
Affects Versions: site
Reporter: lin guo
 Fix For: site

 Attachments: my.diff


 AvroStorageUtils.containsRecursiveRecord() has a bug and returns true for a 
 record with multiple fields of the same type, e.g.
  { type:record, name:Event,  +
 fields:[{name:f1, type:{ type:record,name:EntityID, }}
   {name:f2,type:EntityID},  +
   {name:f3,type:EntityID} ]}

 Patch contains bug fix and unit tests.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (PIG-1833) Contrib's build.xml points to an invalid hadoop-conf

2011-01-28 Thread Jakob Homan (JIRA)
Contrib's build.xml points to an invalid hadoop-conf


 Key: PIG-1833
 URL: https://issues.apache.org/jira/browse/PIG-1833
 Project: Pig
  Issue Type: Bug
Reporter: Jakob Homan


As discovered in testing PIG-1748, the build.xml in the contrib/piggybank/java 
module has {{junit.hadoop..conf}} which points to 
{{${user.home}/pigtest/conf/}}.  In this directory is a hadoop-conf.xml that 
defines a value for {{fs.default.name}} which is valid during the regular test 
runs but not for the contrib modules.  However, any tests in contrib that try 
to access a non-fully qualified file via FileSystem will be routed to this 
value and will then fail when they can't reach it.  If, however, one runs the 
tests directly from contrib module without the pigtest directory existing, the 
tests will pass.  Do any of the contrib modules actually need this variable?  
If not, it should be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1748) Add load/store function AvroStorage for avro data

2011-01-24 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12985972#action_12985972
 ] 

Jakob Homan commented on PIG-1748:
--

@Scott
I can't say I'm convinced, and am in fact more concerned from your example, 
given that this approach essentially builds dependencies on all of those 
projects into Avro.  However, this JIRA isn't the best place to discuss this.  
Is there a discussion about this type of integration going on in Avro that the 
community can contribute to?  Is there a JIRA?  Thanks.

 Add load/store function AvroStorage for avro data
 -

 Key: PIG-1748
 URL: https://issues.apache.org/jira/browse/PIG-1748
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: lin guo
Assignee: Jakob Homan
 Attachments: avro_storage.patch, avro_test_files.tar.gz, 
 PIG-1748-2.patch


 We want to use Pig to process arbitrary Avro data and store results as Avro 
 files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
 Due to discrepancies of Avro and Pig data models, AvroStorage has:
 1. Limited support for record: we do not support recursively defined record 
 because the number of fields in such records is data dependent.
 2. Limited support for union: we only accept nullable union like [null, 
 some-type].
 For simplicity, we also make the following assumptions:
 If the input directory is a leaf directory, then we assume Avro data files in 
 it have the same schema;
 If the input directory contains sub-directories, then we assume Avro data 
 files in all sub-directories have the same schema.
 AvroStorage takes no input parameters when used as a LoadFunc (except for 
 debug [debug-level]). 
 Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
 don't, Avro schema of output data is derived from its 
 Pig schema.
 Detailed documentation can be found in 
 http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1748) Add load/store function AvroStorage for avro data

2011-01-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1748:
-

Attachment: avro_test_files.tar.gz

Attaching binary test avro files used by unit tests.  Need to be untgz'ed and 
placed in 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files
 by reviewer/committer

 Add load/store function AvroStorage for avro data
 -

 Key: PIG-1748
 URL: https://issues.apache.org/jira/browse/PIG-1748
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: lin guo
 Attachments: avro_storage.patch, avro_test_files.tar.gz, 
 PIG-1748-2.patch


 We want to use Pig to process arbitrary Avro data and store results as Avro 
 files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
 Due to discrepancies of Avro and Pig data models, AvroStorage has:
 1. Limited support for record: we do not support recursively defined record 
 because the number of fields in such records is data dependent.
 2. Limited support for union: we only accept nullable union like [null, 
 some-type].
 For simplicity, we also make the following assumptions:
 If the input directory is a leaf directory, then we assume Avro data files in 
 it have the same schema;
 If the input directory contains sub-directories, then we assume Avro data 
 files in all sub-directories have the same schema.
 AvroStorage takes no input parameters when used as a LoadFunc (except for 
 debug [debug-level]). 
 Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
 don't, Avro schema of output data is derived from its 
 Pig schema.
 Detailed documentation can be found in 
 http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1749) Update Pig parser so that function arguments can contain newline characters

2011-01-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1749:
-

Attachment: PIG-1749-2.patch

I'll be finishing this patch for Lin.  Updated patch.  Added test for 
TestQueryParser, which required updating the new grammar (my grammar skills are 
a bity rusty), removed reference to AvroStorage, but left in its details.  
These references don't actually call out to the class and are just used for 
filler purposes.  

 Update Pig parser so that function arguments can contain newline characters
 ---

 Key: PIG-1749
 URL: https://issues.apache.org/jira/browse/PIG-1749
 Project: Pig
  Issue Type: Improvement
Reporter: lin guo
 Attachments: parser.patch, PIG-1749-2.patch


 We want to add this feature so that users can put long function argument 
 strings in multiple lines. PIG-1748 depends on this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.