[jira] [Updated] (SPARK-24855) Built-in AVRO support should support specified schema on write

2020-03-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24855:
--
Affects Version/s: (was: 3.0.0)
   3.1.0

> Built-in AVRO support should support specified schema on write
> --
>
> Key: SPARK-24855
> URL: https://issues.apache.org/jira/browse/SPARK-24855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Brian Lindblom
>Assignee: Brian Lindblom
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> spark-avro appears to have been brought in from an upstream project, 
> [https://github.com/databricks/spark-avro.]  I opened a PR a while ago to 
> enable support for 'forceSchema', which allows us to specify an AVRO schema 
> with which to write our records to handle some use cases we have.  I didn't 
> get this code merged but would like to add this feature to the AVRO 
> reader/writer code that was brought in.  The PR is here and I will follow up 
> with a more formal PR/Patch rebased on spark master branch: 
> https://github.com/databricks/spark-avro/pull/222
>  
> This change allows us to specify a schema, which should be compatible with 
> the schema generated by spark-avro from the dataset definition.  This allows 
> a user to do things like specify default values, change union ordering, or... 
> in the case where you're reading in an AVRO data set, doing some sort of 
> in-line field cleansing, then writing out with the original schema, preserve 
> that original schema in the output container files.  I've had several use 
> cases where this behavior was desired and there were several other asks for 
> this in the spark-avro project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24855) Built-in AVRO support should support specified schema on write

2019-07-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24855:
--
Affects Version/s: (was: 2.4.0)
   3.0.0

> Built-in AVRO support should support specified schema on write
> --
>
> Key: SPARK-24855
> URL: https://issues.apache.org/jira/browse/SPARK-24855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Brian Lindblom
>Assignee: Brian Lindblom
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> spark-avro appears to have been brought in from an upstream project, 
> [https://github.com/databricks/spark-avro.]  I opened a PR a while ago to 
> enable support for 'forceSchema', which allows us to specify an AVRO schema 
> with which to write our records to handle some use cases we have.  I didn't 
> get this code merged but would like to add this feature to the AVRO 
> reader/writer code that was brought in.  The PR is here and I will follow up 
> with a more formal PR/Patch rebased on spark master branch: 
> https://github.com/databricks/spark-avro/pull/222
>  
> This change allows us to specify a schema, which should be compatible with 
> the schema generated by spark-avro from the dataset definition.  This allows 
> a user to do things like specify default values, change union ordering, or... 
> in the case where you're reading in an AVRO data set, doing some sort of 
> in-line field cleansing, then writing out with the original schema, preserve 
> that original schema in the output container files.  I've had several use 
> cases where this behavior was desired and there were several other asks for 
> this in the spark-avro project.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24855) Built-in AVRO support should support specified schema on write

2018-07-18 Thread Brian Lindblom (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Lindblom updated SPARK-24855:
---
Description: 
spark-avro appears to have been brought in from an upstream project, 
[https://github.com/databricks/spark-avro.]  I opened a PR a while ago to 
enable support for 'forceSchema', which allows us to specify an AVRO schema 
with which to write our records to handle some use cases we have.  I didn't get 
this code merged but would like to add this feature to the AVRO reader/writer 
code that was brought in.  The PR is here and I will follow up with a more 
formal PR/Patch rebased on spark master branch: 
https://github.com/databricks/spark-avro/pull/222

 

This change allows us to specify a schema, which should be compatible with the 
schema generated by spark-avro from the dataset definition.  This allows a user 
to do things like specify default values, change union ordering, or... in the 
case where you're reading in an AVRO data set, doing some sort of in-line field 
cleansing, then writing out with the original schema, preserve that original 
schema in the output container files.  I've had several use cases where this 
behavior was desired and there were several other asks for this in the 
spark-avro project.

  was:
spark-avro appears to have been brought in from an upstream project, 
[https://github.com/databricks/spark-avro.]  I opened a PR a while ago to 
enable support for 'forceSchema', which allows us to specify an AVRO schema 
with which to write our records to handle some use cases we have.  I didn't get 
this code merged but would like to add this feature to the AVRO reader/writer 
code that was brought in.  The PR is here and I will follow up with a more 
formal PR/Patch rebased on spark master branch.

 

This change allows us to specify a schema, which should be compatible with the 
schema generated by spark-avro from the dataset definition.  This allows a user 
to do things like specify default values, change union ordering, or... in the 
case where you're reading in an AVRO data set, doing some sort of in-line field 
cleansing, then writing out with the original schema, preserve that original 
schema in the output container files.  I've had several use cases where this 
behavior was desired and there were several other asks for this in the 
spark-avro project.


> Built-in AVRO support should support specified schema on write
> --
>
> Key: SPARK-24855
> URL: https://issues.apache.org/jira/browse/SPARK-24855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Brian Lindblom
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> spark-avro appears to have been brought in from an upstream project, 
> [https://github.com/databricks/spark-avro.]  I opened a PR a while ago to 
> enable support for 'forceSchema', which allows us to specify an AVRO schema 
> with which to write our records to handle some use cases we have.  I didn't 
> get this code merged but would like to add this feature to the AVRO 
> reader/writer code that was brought in.  The PR is here and I will follow up 
> with a more formal PR/Patch rebased on spark master branch: 
> https://github.com/databricks/spark-avro/pull/222
>  
> This change allows us to specify a schema, which should be compatible with 
> the schema generated by spark-avro from the dataset definition.  This allows 
> a user to do things like specify default values, change union ordering, or... 
> in the case where you're reading in an AVRO data set, doing some sort of 
> in-line field cleansing, then writing out with the original schema, preserve 
> that original schema in the output container files.  I've had several use 
> cases where this behavior was desired and there were several other asks for 
> this in the spark-avro project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org