Re: Not able to resolve union for array type

2016-06-06 Thread Maulik Gandhi
Hi Giri,

Can you share a simplified implementation code and the model you are trying
to emit.

Thanks.
- Maulik

On Sat, Jun 4, 2016 at 3:04 PM, Giri P  wrote:

> I need the other way either null or array of records. I was able to insert
> not null records but when I try inserting nulls it throws me error
>
> I'm using spark to write avro files
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Failed
> to serialize task 16, not attempting to retry it. Exception during
> serialization: java.io.NotSerializableException:
> org.apache.avro.Schema$UnionSchema
> Serialization stack:
> - object not serializable (class:
> org.apache.avro.Schema$UnionSchema, value:
> ["null",{"type":"array","items":{"type":"record","name":"MatchedAttrTest","namespace":"com.conversantmedia.data.cp.avro","fields":[{"name":"id","type":"long","doc":"Attribute
> id"},{"name":"updateDate","type":"long","doc":"Update date of the
> attribute"},{"name":"value","type":"string","doc":"Value of the
> attribute"}]}}])
> - writeObject data (class: java.lang.Throwable)
> - object (class org.apache.avro.UnresolvedUnionException,
> org.apache.avro.UnresolvedUnionException: Not in union
> ["null",{"type":"array","items":{"type":"record","name":"MatchedAttrTest","namespace":"com.conversantmedia.data.cp.avro","fields":[{"name":"id","type":"long","doc":"Attribute
> id"},{"name":"updateDate","type":"long","doc":"Update date of the
> attribute"},{"name":"value","type":"string","doc":"Value of the
> attribute"}]}}]: )
> - writeObject data (class: java.lang.Throwable)
> - object (class java.io.IOException, java.io.IOException:
> org.apache.avro.UnresolvedUnionException: Not in union
> ["null",{"type":"array","items":{"type":"record","name":"MatchedAttrTest","namespace":"com.conversantmedia.data.cp.avro","fields":[{"name":"id","type":"long","doc":"Attribute
> id"},{"name":"updateDate","type":"long","doc":"Update date of the
> attribute"},{"name":"value","type":"string","doc":"Value of the
> attribute"}]}}]: )
> - writeObject data (class:
> org.apache.spark.rdd.ParallelCollectionPartition)
> - object (class org.apache.spark.rdd.ParallelCollectionPartition,
> org.apache.spark.rdd.ParallelCollectionPartition@7b0)
> - element of array (index: 0)
> - array (class [Ljava.lang.Object;, size 10)
> - field (class: scala.collection.mutable.ArraySeq, name: array,
> type: class [Ljava.lang.Object;)
> - object (class scala.collection.mutable.ArraySeq,
> ArraySeq(org.apache.spark.rdd.ParallelCollectionPartition@7b0,
> org.apache.spark.rdd.ParallelCollectionPartition@7b1,
> org.apache.spark.rdd.ParallelCollectionPartition@7b2,
> org.apache.spark.rdd.ParallelCollectionPartition@7b3,
> org.apache.spark.rdd.ParallelCollectionPartition@7b4,
> org.apache.spark.rdd.ParallelCollectionPartition@7b5,
> org.apache.spark.rdd.ParallelCollectionPartition@7b6,
> org.apache.spark.rdd.ParallelCollectionPartition@7b7,
> org.apache.spark.rdd.ParallelCollectionPartition@7b8,
> org.apache.spark.rdd.ParallelCollectionPartition@7b9))
> - writeObject data (class:
> org.apache.spark.rdd.CoalescedRDDPartition)
> - object (class org.apache.spark.rdd.CoalescedRDDPartition,
> CoalescedRDDPartition(0,MapPartitionsRDD[8] at map at
> :81,[I@19cc3c95,None))
> - field (class: org.apache.spark.scheduler.ResultTask, name:
> partition, type: interface org.apache.spark.Partition)
> - object (class org.apache.spark.scheduler.ResultTask,
> ResultTask(4, 0))
>
>
> On Fri, Jun 3, 2016 at 10:53 PM, Doug Cutting  wrote:
>
>> Your schema permits null or an array of records. I suspect you want an
>> array containing nulls or records, e.g.,
>>
>> {"type":"array","items":["null",{"type":"record"...
>> On Jun 3, 2016 5:54 PM, "Giri P"  wrote:
>>
>>> Hi,
>>>
>>> I'm getting below error when I try to insert null into union of array
>>>
>>> Caused by: org.apache.avro.UnresolvedUnionException: Not in union
>>> ["null",{"type":"array","items":{"type":"record","name":"DeMatchedAttr","namespace":"cp","fields":[{"name":"id","type":"long","doc":"Attribute
>>> id"},{"name":"updateDate","type":"long","doc":"Update date of the
>>> attribute"},{"name":"value","type":"string","doc":"Value of the
>>> attribute"}]}}]: null
>>>
>>> Is there any issue with the schema ?
>>>
>>> Thanks for the help
>>>
>>> -Giri
>>>
>>
>


Re: Want to Add New Column in Avro Schema

2016-03-23 Thread Maulik Gandhi
Create table DDL looks right to me.

How are you updating *avro.schema.url* ?

Thanks.
- Maulik

On Wed, Mar 23, 2016 at 8:29 AM, Lunagariya, Dhaval <
dhaval.lunagar...@citi.com> wrote:

> Here is the DDL.
>
>
>
> DROP TABLE IF EXISTS TEST;
>
>
>
> CREATE EXTERNAL TABLE TEST
>
> PARTITIONED BY (
>
> COL1 STRING,
>
> COL2 STRING
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>
> STORED AS
>
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>
> LOCATION 'hdfs:///data/hive/TEST'
>
> TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc');
>
>
>
> Thanks,
>
> Dhaval
>
>
>
> *From:* Aaron.Dossett [mailto:aaron.doss...@target.com]
> *Sent:* Wednesday, March 23, 2016 6:50 PM
> *To:* user@avro.apache.org
> *Cc:* 'er.dcpa...@gmail.com'
> *Subject:* Re: Want to Add New Column in Avro Schema
>
>
>
> You shouldn’t have to drop the table, just update the .avsc.  Can you
> share the DDL you use to create the table?
>
>
>
> *From: *"Lunagariya, Dhaval" 
> *Reply-To: *"user@avro.apache.org" 
> *Date: *Wednesday, March 23, 2016 at 8:17 AM
> *To: *"user@avro.apache.org" 
> *Cc: *"'er.dcpa...@gmail.com'" 
> *Subject: *RE: Want to Add New Column in Avro Schema
>
>
>
> Yes. I made require changes in .avsc file and I drop the table and
> re-created using updated .avsc. But I am not getting existing data in that
> case.
>
>
>
> Where am I wrong? Can you through some light
>
>
>
> Thanks,
>
> Dhaval
>
>
>
> *From:* Aaron.Dossett [mailto:aaron.doss...@target.com
> ]
> *Sent:* Wednesday, March 23, 2016 6:36 PM
> *To:* user@avro.apache.org
> *Cc:* 'er.dcpa...@gmail.com'
> *Subject:* Re: Want to Add New Column in Avro Schema
>
>
>
> If you create the external table by reference to the .avsc file
> (TBLPROPERTIES ( 'avro.schema.url’=‘hdfs://foo.avsc')) the all you have to
> do is update that avsc file in a compatible way and Hive should reflect the
> new schema.  I’ve implemented this pattern in my production system for
> several months now.
>
>
>
> -Aaron
>
>
>
> *From: *"Lunagariya, Dhaval" 
> *Reply-To: *"user@avro.apache.org" 
> *Date: *Wednesday, March 23, 2016 at 6:32 AM
> *To: *"user@avro.apache.org" 
> *Cc: *"'er.dcpa...@gmail.com'" 
> *Subject: *Want to Add New Column in Avro Schema
>
>
>
> Hey folks,
>
>
>
> I want to add new column in existing Hive Table. We created external hive
> table with the help of .avsc. Now I want to add new column in that table.
>
>
>
> How can I do that without disturbing any data present in table?
>
>
>
> Please Help.
>
>
>
> Regards,
>
> Dhaval
>
>
>


Re: Want to Add New Column in Avro Schema

2016-03-23 Thread Maulik Gandhi
You can try describe tableName; and see if the new added column appears in
Hive table.

Thanks.
- Maulik


On Wed, Mar 23, 2016 at 8:38 AM, Aaron.Dossett 
wrote:

> And what happens if you simply update the .avsc file on HDFS?  Does
> ‘describe test’ show the new columns?
>
> From: "Lunagariya, Dhaval" 
> Reply-To: "user@avro.apache.org" 
> Date: Wednesday, March 23, 2016 at 8:29 AM
>
> To: "user@avro.apache.org" 
> Cc: "'er.dcpa...@gmail.com'" 
> Subject: RE: Want to Add New Column in Avro Schema
>
> Here is the DDL.
>
>
>
> DROP TABLE IF EXISTS TEST;
>
>
>
> CREATE EXTERNAL TABLE TEST
>
> PARTITIONED BY (
>
> COL1 STRING,
>
> COL2 STRING
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>
> STORED AS
>
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>
> LOCATION 'hdfs:///data/hive/TEST'
>
> TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc');
>
>
>
> Thanks,
>
> Dhaval
>
>
>
> *From:* Aaron.Dossett [mailto:aaron.doss...@target.com
> ]
> *Sent:* Wednesday, March 23, 2016 6:50 PM
> *To:* user@avro.apache.org
> *Cc:* 'er.dcpa...@gmail.com'
> *Subject:* Re: Want to Add New Column in Avro Schema
>
>
>
> You shouldn’t have to drop the table, just update the .avsc.  Can you
> share the DDL you use to create the table?
>
>
>
> *From: *"Lunagariya, Dhaval" 
> *Reply-To: *"user@avro.apache.org" 
> *Date: *Wednesday, March 23, 2016 at 8:17 AM
> *To: *"user@avro.apache.org" 
> *Cc: *"'er.dcpa...@gmail.com'" 
> *Subject: *RE: Want to Add New Column in Avro Schema
>
>
>
> Yes. I made require changes in .avsc file and I drop the table and
> re-created using updated .avsc. But I am not getting existing data in that
> case.
>
>
>
> Where am I wrong? Can you through some light
>
>
>
> Thanks,
>
> Dhaval
>
>
>
> *From:* Aaron.Dossett [mailto:aaron.doss...@target.com
> ]
> *Sent:* Wednesday, March 23, 2016 6:36 PM
> *To:* user@avro.apache.org
> *Cc:* 'er.dcpa...@gmail.com'
> *Subject:* Re: Want to Add New Column in Avro Schema
>
>
>
> If you create the external table by reference to the .avsc file
> (TBLPROPERTIES ( 'avro.schema.url’=‘hdfs://foo.avsc')) the all you have to
> do is update that avsc file in a compatible way and Hive should reflect the
> new schema.  I’ve implemented this pattern in my production system for
> several months now.
>
>
>
> -Aaron
>
>
>
> *From: *"Lunagariya, Dhaval" 
> *Reply-To: *"user@avro.apache.org" 
> *Date: *Wednesday, March 23, 2016 at 6:32 AM
> *To: *"user@avro.apache.org" 
> *Cc: *"'er.dcpa...@gmail.com'" 
> *Subject: *Want to Add New Column in Avro Schema
>
>
>
> Hey folks,
>
>
>
> I want to add new column in existing Hive Table. We created external hive
> table with the help of .avsc. Now I want to add new column in that table.
>
>
>
> How can I do that without disturbing any data present in table?
>
>
>
> Please Help.
>
>
>
> Regards,
>
> Dhaval
>
>
>


Re: Need Help

2016-01-11 Thread Maulik Gandhi
Hi Dhaval,

You can download latest avro-tools.jar and use the tools built in it to
convert HDFS to JSON/Text or other formats.  You will have to manually get
the HDFS to your local machine and run avro-tools on it.

Hope that helps.

Thanks.
- Maulik



On Mon, Jan 11, 2016 at 10:22 AM, Dhaval Patel  wrote:

> Hi Folks,
>
> What is best way to export .avro file from hdfs to flat file??
>
> Please note size of data is in GBs.
>
> --
>
> *Regards,*
> *Dhaval Lunagariya,*
> *Connect on LinkedIn .*
>


Re: Parsing avro binary data from Spark Streaming

2015-09-25 Thread Maulik Gandhi
Hi Daniel,


Below code snippet should help

public SpecificRecord fromBytes(final byte[] bytes, final
Class clazz) {
final BinaryDecoder decoder =
DecoderFactory.get().binaryDecoder(bytes, 0, bytes.length, null);
final DatumReader datumReader = new
SpecificDatumReader(clazz);
try {
final Method newBuilder = clazz.getMethod("newBuilder", clazz);
return ((SpecificRecordBuilderBase)
newBuilder.invoke(null, datumReader.read(null, decoder))).build();
} catch (final IllegalArgumentException e) {
throw new IllegalStateException("Unable to deserialize
avro" + clazz, e);
} catch (final IllegalAccessException e) {
throw new IllegalStateException("Unable to deserialize
avro" + clazz, e);
} catch (final InvocationTargetException e) {
throw new IllegalStateException("Unable to deserialize
avro" + clazz, e);
} catch (final IOException e) {
throw new IllegalStateException("Unable to deserialize
avro" + clazz, e);
} catch (final SecurityException e) {
throw new IllegalStateException("Unable to deserialize
avro" + clazz, e);
} catch (final NoSuchMethodException e) {
throw new IllegalStateException("Unable to deserialize
avro" + clazz, e);
}
}


Thanks.

- Maulik


On Fri, Sep 25, 2015 at 11:18 AM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:

> Hi,
> I'm receiving avro data from Kafka in my Spark Streaming app.
> When reading the data directly from disk I would have just used the
> following manner to parse it :
> val avroRDD = sc.hadoopFile[AvroWrapper[GenericRecord], NullWritable,
> AvroInputFormat[GenericRecord]]("/incoming_1k").coalesce(10)
> val txtRDD = avroRDD.map(l => {l._1.datum.toString} )
>
> I would like to do the same with avro data coming in from kafka, so I'm
> doing the following:
> val avroStream = KafkaUtils.createDirectStream[Array[Byte], Array[Byte],
> DefaultDecoder, DefaultDecoder](ssc, kafkaParams, topicSet)
>
> This leaves me with a byte array and I can't find any example on how to
> convert a byte array to either a GenericRecord or to my avro class.
>
> Any help will be appreciated.
>
> Daniel
>


Feature: Clear all fields / Reset all fields to default value on Record template

2015-01-06 Thread Maulik Gandhi
Hello Avro Users,

Questions:

   1. I was wondering if adding a functionality of clearing all fields on
   Record, makes sense or not?
   2. I was wondering if adding a functionality of reseting all fields to
   default value (the default value would be what has been defined in AVDL) on
   Record, makes sense or not?

I did look up through old mail archive list and JIRA queue, but could not
find anything similar, please point me to any documentation or links if I
missed them.

In order to achieve what I am asking here, my best guess is modifying
existing Record template.  Please correct me if I am going on wrong path.

*Record.vm*:
https://github.com/apache/avro/blob/branch-1.7/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm

Thanks for your help and awesome work!!

Thanks,
- Maulik


Re: Feature: Clear all fields / Reset all fields to default value on Record template

2015-01-06 Thread Maulik Gandhi
Sorry if I forgot to mention, I am working with Builder way of creating a
Record.  Thus taking example of clearing all fields on User example.

User.Builder userBuilder = User.newBuilder();

// Step-1: Logic: set fields on userBuilder, potentially emit User object
by building it userBuilder.build()
// Step-2: Clear the contents on userBuilder, by calling
userBuilder.clearAll() (new feature)
// Step-3: Go back to Step-1

*Avro Getting Started:*
http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Creating+Users

Thanks,
- Maulik

On Tue, Jan 6, 2015 at 3:33 PM, Maulik Gandhi mmg...@gmail.com wrote:

 Hello Avro Users,

 Questions:

1. I was wondering if adding a functionality of clearing all fields on
Record, makes sense or not?
2. I was wondering if adding a functionality of reseting all fields to
default value (the default value would be what has been defined in AVDL) on
Record, makes sense or not?

 I did look up through old mail archive list and JIRA queue, but could not
 find anything similar, please point me to any documentation or links if I
 missed them.

 In order to achieve what I am asking here, my best guess is modifying
 existing Record template.  Please correct me if I am going on wrong path.

 *Record.vm*:
 https://github.com/apache/avro/blob/branch-1.7/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm

 Thanks for your help and awesome work!!

 Thanks,
 - Maulik