[jira] [Commented] (SPARK-33445) Can't parse decimal type from csv file

2020-11-18 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234773#comment-17234773
 ] 

Dongjoon Hyun commented on SPARK-33445:
---

[~bullsoverbears]. I asked you to reopen with the valid example. `schema` 
works, too.

{code}
>>> mydf2 = spark.read.csv("tsd.csv", header=True, inferSchema=True)
>>> mydf2.schema
StructType(List(StructField(Epoch Miliseconds,DoubleType,true)))
>>> sc.version
'3.0.0'
{code}

{code}
>>> mydf2 = spark.read.csv("tsd.csv", header=True, inferSchema=True)
>>> mydf2.schema
StructType(List(StructField(Epoch Miliseconds,DoubleType,true)))
>>> sc.version
'3.0.1'
{code}

> Can't parse decimal type from csv file
> --
>
> Key: SPARK-33445
> URL: https://issues.apache.org/jira/browse/SPARK-33445
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 2.4.7, 3.0.0
>Reporter: Punit Shah
>Priority: Major
> Attachments: tsd.csv
>
>
> The attached file is a one column csv file containing decimals.
> Execute: {color:#de350b}mydf2 = spark_session.read.csv("tsd.csv", 
> header=True, inferSchema=True){color}
> Then invoking {color:#de350b}mydf2.schema{color} will result in error:
> {color:#ff8b00}ValueError: Could not parse datatype: decimal(6,-7){color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33445) Can't parse decimal type from csv file

2020-11-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33445.
---
Resolution: Cannot Reproduce

> Can't parse decimal type from csv file
> --
>
> Key: SPARK-33445
> URL: https://issues.apache.org/jira/browse/SPARK-33445
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 2.4.7, 3.0.0
>Reporter: Punit Shah
>Priority: Major
> Attachments: tsd.csv
>
>
> The attached file is a one column csv file containing decimals.
> Execute: {color:#de350b}mydf2 = spark_session.read.csv("tsd.csv", 
> header=True, inferSchema=True){color}
> Then invoking {color:#de350b}mydf2.schema{color} will result in error:
> {color:#ff8b00}ValueError: Could not parse datatype: decimal(6,-7){color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33044) Add a Jenkins build and test job for Scala 2.13

2020-11-18 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234765#comment-17234765
 ] 

Yang Jie commented on SPARK-33044:
--

Gentle ping, [~shaneknapp]. 

> Add a Jenkins build and test job for Scala 2.13
> ---
>
> Key: SPARK-33044
> URL: https://issues.apache.org/jira/browse/SPARK-33044
> Project: Spark
>  Issue Type: Sub-task
>  Components: jenkins
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {{Master}} branch seems to be almost ready for Scala 2.13 now, we need a 
> Jenkins test job to verify current work results and CI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33476) Generalize ExecutorSource to expose user-given file system schemes

2020-11-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33476:
-

Assignee: Dongjoon Hyun

> Generalize ExecutorSource to expose user-given file system schemes
> --
>
> Key: SPARK-33476
> URL: https://issues.apache.org/jira/browse/SPARK-33476
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33476) Generalize ExecutorSource to expose user-given file system schemes

2020-11-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33476.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30405
[https://github.com/apache/spark/pull/30405]

> Generalize ExecutorSource to expose user-given file system schemes
> --
>
> Key: SPARK-33476
> URL: https://issues.apache.org/jira/browse/SPARK-33476
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33480) support char/varchar type

2020-11-18 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-33480:
---

 Summary: support char/varchar type
 Key: SPARK-33480
 URL: https://issues.apache.org/jira/browse/SPARK-33480
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19256) Hive bucketing write support

2020-11-18 Thread Piotr Findeisen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234723#comment-17234723
 ] 

Piotr Findeisen commented on SPARK-19256:
-

>  1.for Hive 3.x.y, writing Hive bucketed table with Hive murmur3hash.

> 2.for Hive 2.x.y and 1.x.y, writing Hive bucketed table with Hive hivehash.

 

Should i assume this is a simplification and it will be actually controlled by 
{{bucketing_version}} table property?
e.g. Hive 3 and Presto 321+ support both bucketing version and user can still 
choose one (should they have a compatibility requirement with older systems).

> Hive bucketing write support
> 
>
> Key: SPARK-19256
> URL: https://issues.apache.org/jira/browse/SPARK-19256
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0
>Reporter: Tejas Patil
>Priority: Minor
>
> Update (2020 by Cheng Su):
> We use this JIRA to track progress for Hive bucketing write support in Spark. 
> The goal is for Spark to write Hive bucketed table, to be compatible with 
> other compute engines (Hive and Presto).
>  
> Current status for Hive bucketed table in Spark:
> Not support for reading Hive bucketed table: read bucketed table as 
> non-bucketed table.
> Wrong behavior for writing Hive ORC and Parquet bucketed table: write 
> orc/parquet bucketed table as non-bucketed table (code path: 
> InsertIntoHadoopFsRelationCommand -> FileFormatWriter).
> Do not allow for writing Hive non-ORC/Parquet bucketed table: throw exception 
> by default if writing non-orc/parquet bucketed table (code path: 
> InsertIntoHiveTable), and exception can be disabled by setting config 
> `hive.enforce.bucketing`=false and `hive.enforce.sorting`=false, which will 
> write as non-bucketed table.
>  
> Current status for Hive bucketed table in Hive:
> Hive 3.0.0 and after: support writing bucketed table with Hive murmur3hash 
> (https://issues.apache.org/jira/browse/HIVE-18910).
> Hive 1.x.y and 2.x.y: support writing bucketed table with Hive hivehash.
> Hive on Tez: support zero and multiple files per bucket 
> (https://issues.apache.org/jira/browse/HIVE-14014). And more code pointer on 
> read path - 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java#L183-L212]
>  .
>  
> Current status for Hive bucketed table in Presto (take presto-sql here):
> Support writing bucketed table with Hive murmur3hash and hivehash 
> ([https://github.com/prestosql/presto/pull/1697]).
> Support zero and multiple files per bucket 
> ([https://github.com/prestosql/presto/pull/822]).
>  
> TLDR is to achieve Hive bucketed table compatibility across Spark, Presto and 
> Hive. Here with this JIRA, we need to add support writing Hive bucketed table 
> with Hive murmur3hash (for Hive 3.x.y) and hivehash (for Hive 1.x.y and 
> 2.x.y).
>  
> To allow Spark efficiently read Hive bucketed table, this needs more radical 
> change and we decide to wait until data source v2 supports bucketing, and do 
> the read path on data source v2. Read path will not covered by this JIRA.
>  
> Original description (2017 by Tejas Patil):
> JIRA to track design discussions and tasks related to Hive bucketing support 
> in Spark.
> Proposal : 
> [https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31800) Unable to disable Kerberos when submitting jobs to Kubernetes

2020-11-18 Thread Bongani Shongwe (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234686#comment-17234686
 ] 

Bongani Shongwe commented on SPARK-31800:
-

I'm also hitting the same issue as OP. I tested using the Spark 3.0 release. 
Has there been any other work around or hypothesis on how to overcome the issue?

> Unable to disable Kerberos when submitting jobs to Kubernetes
> -
>
> Key: SPARK-31800
> URL: https://issues.apache.org/jira/browse/SPARK-31800
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: James Boylan
>Priority: Major
>
> When you attempt to submit a process to Kubernetes using spark-submit through 
> --master, it returns the exception:
> {code:java}
> 20/05/22 20:25:54 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> Exception in thread "main" org.apache.spark.SparkException: Please specify 
> spark.kubernetes.file.upload.path property.
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:290)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
> at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:165)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
> at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
> at scala.collection.immutable.List.foldLeft(List.scala:89)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 20/05/22 20:25:54 INFO ShutdownHookManager: Shutdown hook called
> 20/05/22 20:25:54 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/p1/y24myg413wx1l1l52bsdn2hrgq/T/spark-c94db9c5-b8a8-414d-b01d-f6369d31c9b8
>  {code}
> No changes in settings appear to be able to disable Kerberos. This is when 
> running a simple execution of the SparkPi on our lab cluster. The command 
> being used is
> {code:java}
> ./bin/spark-submit --master k8s://https://{api_hostname} --deploy-mode 
> cluster --name spark-test --class org.apache.spark.examples.SparkPi --conf 
> 

[jira] [Commented] (SPARK-31962) Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234596#comment-17234596
 ] 

Apache Spark commented on SPARK-31962:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/30411

> Provide modifiedAfter and modifiedBefore options when filtering from a 
> batch-based file data source
> ---
>
> Key: SPARK-31962
> URL: https://issues.apache.org/jira/browse/SPARK-31962
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Christopher Highman
>Priority: Minor
>
> Two new options, _modifiiedBefore_ and _modifiedAfter_, is provided expecting 
> a value in '-MM-DDTHH:mm:ss' format. _PartioningAwareFileIndex_ considers 
> these options during the process of checking for files, just before 
> considering applied _PathFilters_ such as {{pathGlobFilter.}} In order to 
> filter file results, a new PathFilter class was derived for this purpose. 
> General house-keeping around classes extending PathFilter was performed for 
> neatness. It became apparent support was needed to handle multiple potential 
> path filters. Logic was introduced for this purpose and the associated tests 
> written.
>  
> When loading files from a data source, there can often times be thousands of 
> file within a respective file path. In many cases I've seen, we want to start 
> loading from a folder path and ideally be able to begin loading files having 
> modification dates past a certain point. This would mean out of thousands of 
> potential files, only the ones with modification dates greater than the 
> specified timestamp would be considered. This saves a ton of time 
> automatically and reduces significant complexity managing this in code.
>  
> *Example Usages*
> _Load all CSV files modified after date:_
> {{spark.read.format("csv").option("modifiedAfter","2020-06-15T05:00:00").load()}}
> _Load all CSV files modified before date:_
> {{spark.read.format("csv").option("modifiedBefore","2020-06-15T05:00:00").load()}}
> _Load all CSV files modified between two dates:_
> {{spark.read .format("csv") .option("modifiedAfter","2019-01-15T05:00:00") 
> .option("modifiedBefore","2020-06-15T05:00:00") .load()}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33475) Bump ANTLR runtime version to 4.8-1

2020-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33475.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30404
[https://github.com/apache/spark/pull/30404]

> Bump ANTLR runtime version to 4.8-1
> ---
>
> Key: SPARK-33475
> URL: https://issues.apache.org/jira/browse/SPARK-33475
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.1.0
>
>
> This PR intends to upgrade ANTLR runtime from 4.7.1 to 4.8-1.
> Release note of v4.8 and v4.7.2 (the v4.7.2 release has a few minor bug fixes 
> for java targets):
>  - v4.8: https://github.com/antlr/antlr4/releases/tag/4.8
>  - v4.7.2: https://github.com/antlr/antlr4/releases/tag/4.7.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33475) Bump ANTLR runtime version to 4.8-1

2020-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33475:


Assignee: Takeshi Yamamuro

> Bump ANTLR runtime version to 4.8-1
> ---
>
> Key: SPARK-33475
> URL: https://issues.apache.org/jira/browse/SPARK-33475
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
>
> This PR intends to upgrade ANTLR runtime from 4.7.1 to 4.8-1.
> Release note of v4.8 and v4.7.2 (the v4.7.2 release has a few minor bug fixes 
> for java targets):
>  - v4.8: https://github.com/antlr/antlr4/releases/tag/4.8
>  - v4.7.2: https://github.com/antlr/antlr4/releases/tag/4.7.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24554) Add MapType Support for Arrow in PySpark

2020-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-24554.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30393
[https://github.com/apache/spark/pull/30393]

> Add MapType Support for Arrow in PySpark
> 
>
> Key: SPARK-24554
> URL: https://issues.apache.org/jira/browse/SPARK-24554
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.3.1
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: bulk-closed
> Fix For: 3.1.0
>
>
> Add support for MapType in Arrow related classes in Scala/Java and pyarrow 
> functionality in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24554) Add MapType Support for Arrow in PySpark

2020-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-24554:


Assignee: Bryan Cutler

> Add MapType Support for Arrow in PySpark
> 
>
> Key: SPARK-24554
> URL: https://issues.apache.org/jira/browse/SPARK-24554
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.3.1
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: bulk-closed
>
> Add support for MapType in Arrow related classes in Scala/Java and pyarrow 
> functionality in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33478) Allow overwrite a path that is also being read under dynamic partition overwrite

2020-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234488#comment-17234488
 ] 

Apache Spark commented on SPARK-33478:
--

User 'manbuyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30410

> Allow overwrite a path that is also being read under dynamic partition 
> overwrite
> 
>
> Key: SPARK-33478
> URL: https://issues.apache.org/jira/browse/SPARK-33478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: jinhai
>Priority: Major
>
> Currently, Insert overwrite cannot overwrite a path that is also being read 
> under dynamic partition overwrite. 
> In the class DataSourceAnalysis, DDLUtils.verifyNotReadPath is called to 
> determine whether it can be overwrite, but in the 
> InsertIntoHadoopFsRelationCommand.dynamicPartitionOverwrite method, we need 
> to consider the scenario where staticPartitions.size equals 
> partitionColumns.length, which should also be allowed to overwrite path this 
> is also being read.
> Consider the following statement:
> CREATE TABLE insertTable(i int, part1 int, part2 int) USING PARQUET 
> PARTITIONED BY (part1, part2);
> INSERT OVERWRITE TABLE insertTable PARTITION(part1=1, part2=1) SELECT i + 1 
> FROM insertTable;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33478) Allow overwrite a path that is also being read under dynamic partition overwrite

2020-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33478:


Assignee: Apache Spark

> Allow overwrite a path that is also being read under dynamic partition 
> overwrite
> 
>
> Key: SPARK-33478
> URL: https://issues.apache.org/jira/browse/SPARK-33478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: jinhai
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Insert overwrite cannot overwrite a path that is also being read 
> under dynamic partition overwrite. 
> In the class DataSourceAnalysis, DDLUtils.verifyNotReadPath is called to 
> determine whether it can be overwrite, but in the 
> InsertIntoHadoopFsRelationCommand.dynamicPartitionOverwrite method, we need 
> to consider the scenario where staticPartitions.size equals 
> partitionColumns.length, which should also be allowed to overwrite path this 
> is also being read.
> Consider the following statement:
> CREATE TABLE insertTable(i int, part1 int, part2 int) USING PARQUET 
> PARTITIONED BY (part1, part2);
> INSERT OVERWRITE TABLE insertTable PARTITION(part1=1, part2=1) SELECT i + 1 
> FROM insertTable;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33478) Allow overwrite a path that is also being read under dynamic partition overwrite

2020-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234489#comment-17234489
 ] 

Apache Spark commented on SPARK-33478:
--

User 'manbuyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30410

> Allow overwrite a path that is also being read under dynamic partition 
> overwrite
> 
>
> Key: SPARK-33478
> URL: https://issues.apache.org/jira/browse/SPARK-33478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: jinhai
>Priority: Major
>
> Currently, Insert overwrite cannot overwrite a path that is also being read 
> under dynamic partition overwrite. 
> In the class DataSourceAnalysis, DDLUtils.verifyNotReadPath is called to 
> determine whether it can be overwrite, but in the 
> InsertIntoHadoopFsRelationCommand.dynamicPartitionOverwrite method, we need 
> to consider the scenario where staticPartitions.size equals 
> partitionColumns.length, which should also be allowed to overwrite path this 
> is also being read.
> Consider the following statement:
> CREATE TABLE insertTable(i int, part1 int, part2 int) USING PARQUET 
> PARTITIONED BY (part1, part2);
> INSERT OVERWRITE TABLE insertTable PARTITION(part1=1, part2=1) SELECT i + 1 
> FROM insertTable;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33478) Allow overwrite a path that is also being read under dynamic partition overwrite

2020-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33478:


Assignee: (was: Apache Spark)

> Allow overwrite a path that is also being read under dynamic partition 
> overwrite
> 
>
> Key: SPARK-33478
> URL: https://issues.apache.org/jira/browse/SPARK-33478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: jinhai
>Priority: Major
>
> Currently, Insert overwrite cannot overwrite a path that is also being read 
> under dynamic partition overwrite. 
> In the class DataSourceAnalysis, DDLUtils.verifyNotReadPath is called to 
> determine whether it can be overwrite, but in the 
> InsertIntoHadoopFsRelationCommand.dynamicPartitionOverwrite method, we need 
> to consider the scenario where staticPartitions.size equals 
> partitionColumns.length, which should also be allowed to overwrite path this 
> is also being read.
> Consider the following statement:
> CREATE TABLE insertTable(i int, part1 int, part2 int) USING PARQUET 
> PARTITIONED BY (part1, part2);
> INSERT OVERWRITE TABLE insertTable PARTITION(part1=1, part2=1) SELECT i + 1 
> FROM insertTable;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33479) Make apiKey of docsearch configurable

2020-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33479:


Assignee: Gengliang Wang  (was: Apache Spark)

> Make apiKey of docsearch configurable
> -
>
> Key: SPARK-33479
> URL: https://issues.apache.org/jira/browse/SPARK-33479
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> After https://github.com/apache/spark/pull/30292, our Spark documentation 
> site supports searching. 
> However, the default API key always points to the latest release doc. We have 
> to set different API keys for different releases. Otherwise, the search 
> results are always based on the latest 
> documentation(https://spark.apache.org/docs/latest/) even when visiting the
> documentation of previous releases.
> As per discussion in 
> https://github.com/apache/spark/pull/30292#issuecomment-725613417, we should 
> make the API key configurable and avoid hardcoding in the HTML template.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33479) Make apiKey of docsearch configurable

2020-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33479:


Assignee: Apache Spark  (was: Gengliang Wang)

> Make apiKey of docsearch configurable
> -
>
> Key: SPARK-33479
> URL: https://issues.apache.org/jira/browse/SPARK-33479
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> After https://github.com/apache/spark/pull/30292, our Spark documentation 
> site supports searching. 
> However, the default API key always points to the latest release doc. We have 
> to set different API keys for different releases. Otherwise, the search 
> results are always based on the latest 
> documentation(https://spark.apache.org/docs/latest/) even when visiting the
> documentation of previous releases.
> As per discussion in 
> https://github.com/apache/spark/pull/30292#issuecomment-725613417, we should 
> make the API key configurable and avoid hardcoding in the HTML template.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33479) Make apiKey of docsearch configurable

2020-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234484#comment-17234484
 ] 

Apache Spark commented on SPARK-33479:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30409

> Make apiKey of docsearch configurable
> -
>
> Key: SPARK-33479
> URL: https://issues.apache.org/jira/browse/SPARK-33479
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> After https://github.com/apache/spark/pull/30292, our Spark documentation 
> site supports searching. 
> However, the default API key always points to the latest release doc. We have 
> to set different API keys for different releases. Otherwise, the search 
> results are always based on the latest 
> documentation(https://spark.apache.org/docs/latest/) even when visiting the
> documentation of previous releases.
> As per discussion in 
> https://github.com/apache/spark/pull/30292#issuecomment-725613417, we should 
> make the API key configurable and avoid hardcoding in the HTML template.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33479) Make apiKey of docsearch configurable

2020-11-18 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-33479:
--

 Summary: Make apiKey of docsearch configurable
 Key: SPARK-33479
 URL: https://issues.apache.org/jira/browse/SPARK-33479
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.1.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


After https://github.com/apache/spark/pull/30292, our Spark documentation site 
supports searching. 
However, the default API key always points to the latest release doc. We have 
to set different API keys for different releases. Otherwise, the search results 
are always based on the latest 
documentation(https://spark.apache.org/docs/latest/) even when visiting the
documentation of previous releases.

As per discussion in 
https://github.com/apache/spark/pull/30292#issuecomment-725613417, we should 
make the API key configurable and avoid hardcoding in the HTML template.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33478) Allow overwrite a path that is also being read under dynamic partition overwrite

2020-11-18 Thread jinhai (Jira)
jinhai created SPARK-33478:
--

 Summary: Allow overwrite a path that is also being read under 
dynamic partition overwrite
 Key: SPARK-33478
 URL: https://issues.apache.org/jira/browse/SPARK-33478
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 3.0.0
Reporter: jinhai


Currently, Insert overwrite cannot overwrite a path that is also being read 
under dynamic partition overwrite. 
In the class DataSourceAnalysis, DDLUtils.verifyNotReadPath is called to 
determine whether it can be overwrite, but in the 
InsertIntoHadoopFsRelationCommand.dynamicPartitionOverwrite method, we need to 
consider the scenario where staticPartitions.size equals 
partitionColumns.length, which should also be allowed to overwrite path this is 
also being read.

Consider the following statement:

CREATE TABLE insertTable(i int, part1 int, part2 int) USING PARQUET PARTITIONED 
BY (part1, part2);

INSERT OVERWRITE TABLE insertTable PARTITION(part1=1, part2=1) SELECT i + 1 
FROM insertTable;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33477) Hive partition pruning support date type

2020-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234386#comment-17234386
 ] 

Apache Spark commented on SPARK-33477:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30408

>  Hive partition pruning support date type
> -
>
> Key: SPARK-33477
> URL: https://issues.apache.org/jira/browse/SPARK-33477
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Hive partition pruning can support date type:
> https://issues.apache.org/jira/browse/HIVE-5679
> https://github.com/apache/hive/commit/5106bf1c8671740099fca8e1a7d4b37afe97137f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33477) Hive partition pruning support date type

2020-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33477:


Assignee: Yuming Wang  (was: Apache Spark)

>  Hive partition pruning support date type
> -
>
> Key: SPARK-33477
> URL: https://issues.apache.org/jira/browse/SPARK-33477
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Hive partition pruning can support date type:
> https://issues.apache.org/jira/browse/HIVE-5679
> https://github.com/apache/hive/commit/5106bf1c8671740099fca8e1a7d4b37afe97137f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33477) Hive partition pruning support date type

2020-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33477:


Assignee: Apache Spark  (was: Yuming Wang)

>  Hive partition pruning support date type
> -
>
> Key: SPARK-33477
> URL: https://issues.apache.org/jira/browse/SPARK-33477
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Hive partition pruning can support date type:
> https://issues.apache.org/jira/browse/HIVE-5679
> https://github.com/apache/hive/commit/5106bf1c8671740099fca8e1a7d4b37afe97137f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33477) Hive partition pruning support date type

2020-11-18 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33477:

Fix Version/s: (was: 3.1.0)

>  Hive partition pruning support date type
> -
>
> Key: SPARK-33477
> URL: https://issues.apache.org/jira/browse/SPARK-33477
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Hive partition pruning can support date type:
> https://issues.apache.org/jira/browse/HIVE-5679
> https://github.com/apache/hive/commit/5106bf1c8671740099fca8e1a7d4b37afe97137f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33477) Hive partition pruning support date type

2020-11-18 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33477:

Description: 
Hive partition pruning can support date type:
https://issues.apache.org/jira/browse/HIVE-5679
https://github.com/apache/hive/commit/5106bf1c8671740099fca8e1a7d4b37afe97137f

  was:
Hive partition pruning can support date type:
https://issues.apache.org/jira/browse/HIVE-5679


>  Hive partition pruning support date type
> -
>
> Key: SPARK-33477
> URL: https://issues.apache.org/jira/browse/SPARK-33477
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> Hive partition pruning can support date type:
> https://issues.apache.org/jira/browse/HIVE-5679
> https://github.com/apache/hive/commit/5106bf1c8671740099fca8e1a7d4b37afe97137f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33477) Hive partition pruning support date type

2020-11-18 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33477:

Description: 
Hive partition pruning can support date type:
https://issues.apache.org/jira/browse/HIVE-5679

  was:
Hive partition pruning can support Contains, StartsWith and EndsWith predicate:
https://github.com/apache/hive/blob/0c2c8a7f57330880f156466526bc0fdc94681035/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java#L1074-L1075
https://github.com/apache/hive/commit/0c2c8a7f57330880f156466526bc0fdc94681035#diff-b1200d4259fafd48d7bbd0050e89772218813178f68461a2e82551c52319b282


>  Hive partition pruning support date type
> -
>
> Key: SPARK-33477
> URL: https://issues.apache.org/jira/browse/SPARK-33477
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> Hive partition pruning can support date type:
> https://issues.apache.org/jira/browse/HIVE-5679



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33477) Hive partition pruning support date type

2020-11-18 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-33477:
---

 Summary:  Hive partition pruning support date type
 Key: SPARK-33477
 URL: https://issues.apache.org/jira/browse/SPARK-33477
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang
Assignee: Yuming Wang
 Fix For: 3.1.0


Hive partition pruning can support Contains, StartsWith and EndsWith predicate:
https://github.com/apache/hive/blob/0c2c8a7f57330880f156466526bc0fdc94681035/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java#L1074-L1075
https://github.com/apache/hive/commit/0c2c8a7f57330880f156466526bc0fdc94681035#diff-b1200d4259fafd48d7bbd0050e89772218813178f68461a2e82551c52319b282



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2