[jira] [Updated] (SPARK-18230) MatrixFactorizationModel.recommendProducts throws NoSuchElement exception when the user does not exist
[ https://issues.apache.org/jira/browse/SPARK-18230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18230: -- Fix Version/s: (was: 2.4.0) (We don't set fix version until it's resolved) > MatrixFactorizationModel.recommendProducts throws NoSuchElement exception > when the user does not exist > -- > > Key: SPARK-18230 > URL: https://issues.apache.org/jira/browse/SPARK-18230 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.0.1 >Reporter: Mikael Ståldal >Priority: Minor > > When invoking {{MatrixFactorizationModel.recommendProducts(Int, Int)}} with a > non-existing user, a {{java.util.NoSuchElementException}} is thrown: > {code} > java.util.NoSuchElementException: next on empty iterator > at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) > at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) > at > scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63) > at scala.collection.IterableLike$class.head(IterableLike.scala:107) > at > scala.collection.mutable.WrappedArray.scala$collection$IndexedSeqOptimized$$super$head(WrappedArray.scala:35) > at > scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126) > at scala.collection.mutable.WrappedArray.head(WrappedArray.scala:35) > at > org.apache.spark.mllib.recommendation.MatrixFactorizationModel.recommendProducts(MatrixFactorizationModel.scala:169) > {code} > It would be nice if it returned the empty array, or throwed a more specific > exception, and that was documented in ScalaDoc for the method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24799) A solution of dealing with data skew in left,right,inner join
[ https://issues.apache.org/jira/browse/SPARK-24799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-24799. --- Resolution: Duplicate > A solution of dealing with data skew in left,right,inner join > - > > Key: SPARK-24799 > URL: https://issues.apache.org/jira/browse/SPARK-24799 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: marymwu >Priority: Major > > For the left,right,inner join statment execution, this solution is mainling > about to devide the partions where the data skew has occured into serveral > partions with smaller data scale, in order to parallelly execute more tasks > to increase effeciency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24799) A solution of dealing with data skew in left,right,inner join
[ https://issues.apache.org/jira/browse/SPARK-24799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-24799: -- Target Version/s: (was: 2.3.0) Fix Version/s: (was: 2.3.0) Have a quick look at [https://spark.apache.org/contributing.html] for guidance on filling out JIRAs. This one, yeah, is likely a duplicate of other general issues about skew. > A solution of dealing with data skew in left,right,inner join > - > > Key: SPARK-24799 > URL: https://issues.apache.org/jira/browse/SPARK-24799 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: marymwu >Priority: Major > > For the left,right,inner join statment execution, this solution is mainling > about to devide the partions where the data skew has occured into serveral > partions with smaller data scale, in order to parallelly execute more tasks > to increase effeciency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24807) Adding files/jars twice: output a warning and add a note
[ https://issues.apache.org/jira/browse/SPARK-24807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24807. - Resolution: Fixed Assignee: Maxim Gekk Fix Version/s: 2.4.0 > Adding files/jars twice: output a warning and add a note > > > Key: SPARK-24807 > URL: https://issues.apache.org/jira/browse/SPARK-24807 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Trivial > Fix For: 2.4.0 > > > In current version of Spark (2.3.x), one file/jar can be added only once. > Next additions of the same path are silently ignored. This behavoir is not > properly documented: > https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.SparkContext > This confuses our users and support teams in our company. The ticket aims to > output a warning which should clearly state that second addition of the same > path is not supported now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24776) AVRO unit test: use SQLTestUtils and Replace deprecated methods
[ https://issues.apache.org/jira/browse/SPARK-24776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24776. - Resolution: Fixed > AVRO unit test: use SQLTestUtils and Replace deprecated methods > --- > > Key: SPARK-24776 > URL: https://issues.apache.org/jira/browse/SPARK-24776 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 2.4.0 > > > * use SQLTestUtils > * Replace deprecated methods > * etc > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive
[ https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544398#comment-16544398 ] Hyukjin Kwon commented on SPARK-20202: -- I think we are unclear about how we are going to deal with this and it's been left open for a while .. [~rxin], do you maybe have some preference in [my comment above|https://issues.apache.org/jira/browse/SPARK-20202?focusedCommentId=16541034=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16541034]? 1. Go with Saisai's patch in HIVE-16391 - Publishing Hive 1.2.x could be easier but will give some overhead to Hive side (e.g., maintaining the old branches, for example, backports). - If I understood correctly, we have less problems (e.g., policy stuff) if we go publishing Hive 1.2.x HIVE-16391 2. Target the upgrade with [~q79969786]'s fix, and add some fixes to our current fork when there's strong reasons - It is difficult but [~q79969786] made and completed an initial try about the upgrade. It still need some further investigation (e.g., see [SPARK-23710|https://issues.apache.org/jira/browse/SPARK-23710]) but the try made the regression tests passed at least. She's willing to finish this. - If we miss the Hive upgrade to 2.3.x in Spark 3.0.0, we should probably target 4.0.0 with upper version of Hive, which I guess make this upgrade even harder. - Looks we implicitly agree upon this should be the final goal in the long term. See also [~ste...@apache.org]'s [comment above|https://issues.apache.org/jira/browse/SPARK-20202?focusedCommentId=16500560=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16500560]. I am re-raising and giving some refreshes here because I personally see: - Few facts arrived here since the JIRA was open. So, it looked to me it might be better we consider the possible options again. - Looks we are quite unclear on this about how we should get through this to me. - To me, I am sure we need to share and feel in the same way for this JIRA and, it looks I need some more supports from you guys before we go ahead because it'd be a kind of not easily revertible changes. - Branch-2.4 will be cut out soon and we will go for Spark 3.0.0 if I am not mistaken. I know there are many sensitive things going on here; however, please kindly consider and give some inputs. I am sure we all feel that we should resolve this. Lastly, FWIW, I am doing this on my own rather individually if it matters to anyone in any case. > Remove references to org.spark-project.hive > --- > > Key: SPARK-20202 > URL: https://issues.apache.org/jira/browse/SPARK-20202 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 1.6.4, 2.0.3, 2.1.1 >Reporter: Owen O'Malley >Priority: Major > > Spark can't continue to depend on their fork of Hive and must move to > standard Hive versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24718) Timestamp support pushdown to parquet data source
[ https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-24718: Assignee: Yuming Wang > Timestamp support pushdown to parquet data source > - > > Key: SPARK-24718 > URL: https://issues.apache.org/jira/browse/SPARK-24718 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 2.4.0 > > > Some thing like this: > {code:java} > case ParquetSchemaType(TIMESTAMP_MICROS, INT64, null) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) > .asInstanceOf[java.lang.Long]).orNull) > case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, null) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime > .asInstanceOf[java.lang.Long]).orNull) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24718) Timestamp support pushdown to parquet data source
[ https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24718. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21741 [https://github.com/apache/spark/pull/21741] > Timestamp support pushdown to parquet data source > - > > Key: SPARK-24718 > URL: https://issues.apache.org/jira/browse/SPARK-24718 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 2.4.0 > > > Some thing like this: > {code:java} > case ParquetSchemaType(TIMESTAMP_MICROS, INT64, null) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) > .asInstanceOf[java.lang.Long]).orNull) > case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, null) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime > .asInstanceOf[java.lang.Long]).orNull) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
[ https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544386#comment-16544386 ] Saisai Shao edited comment on SPARK-24804 at 7/15/18 1:29 AM: -- Please don't set target version or fix version. Committers will help to set when this issue is resolved. I'm removing the target version of this Jira to avoid blocking the release of 2.3.2 was (Author: jerryshao): Please don't set target version or fix version. Committers will help to set when this issue is resolved. > There are duplicate words in the title in the DatasetSuite > -- > > Key: SPARK-24804 > URL: https://issues.apache.org/jira/browse/SPARK-24804 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: hantiantian >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
[ https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544386#comment-16544386 ] Saisai Shao commented on SPARK-24804: - Please don't set target version or fix version. Committers will help to set when this issue is resolved. > There are duplicate words in the title in the DatasetSuite > -- > > Key: SPARK-24804 > URL: https://issues.apache.org/jira/browse/SPARK-24804 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: hantiantian >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
[ https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-24804: Priority: Trivial (was: Minor) > There are duplicate words in the title in the DatasetSuite > -- > > Key: SPARK-24804 > URL: https://issues.apache.org/jira/browse/SPARK-24804 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: hantiantian >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
[ https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-24804: Target Version/s: (was: 2.3.2) > There are duplicate words in the title in the DatasetSuite > -- > > Key: SPARK-24804 > URL: https://issues.apache.org/jira/browse/SPARK-24804 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: hantiantian >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
[ https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-24804: Fix Version/s: (was: 2.3.2) > There are duplicate words in the title in the DatasetSuite > -- > > Key: SPARK-24804 > URL: https://issues.apache.org/jira/browse/SPARK-24804 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: hantiantian >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24808) Unable to get the SparkEnv.get.metricsSystem on Spark Shell
Praneeth Ramesh created SPARK-24808: --- Summary: Unable to get the SparkEnv.get.metricsSystem on Spark Shell Key: SPARK-24808 URL: https://issues.apache.org/jira/browse/SPARK-24808 Project: Spark Issue Type: Bug Components: Spark Core, Spark Shell Affects Versions: 2.3.1 Environment: Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144) Reporter: Praneeth Ramesh Open a Spark Shell in local or on Stand alone Cluster scala> import org.apache.spark.SparkEnv import org.apache.spark.SparkEnv scala> SparkEnv.get.metricsSystem error: missing or invalid dependency detected while loading class file 'MetricsSystem.class'. Could not access term eclipse in package org, because it (or its dependencies) are missing. Check your build definition for missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.) A full rebuild may help if 'MetricsSystem.class' was compiled against an incompatible version of org. error: missing or invalid dependency detected while loading class file 'MetricsSystem.class'. Could not access term jetty in value org.eclipse, because it (or its dependencies) are missing. Check your build definition for missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.) A full rebuild may help if 'MetricsSystem.class' was compiled against an incompatible version of org.eclipse. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24754) Minhash integer overflow
[ https://issues.apache.org/jira/browse/SPARK-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-24754: - Assignee: Sean Owen > Minhash integer overflow > > > Key: SPARK-24754 > URL: https://issues.apache.org/jira/browse/SPARK-24754 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0 >Reporter: Jiayuan Ma >Assignee: Sean Owen >Priority: Minor > Fix For: 2.4.0 > > > Hash computation in MinHashLSHModel has integer overflow bug. > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala#L69 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24754) Minhash integer overflow
[ https://issues.apache.org/jira/browse/SPARK-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-24754. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21750 [https://github.com/apache/spark/pull/21750] > Minhash integer overflow > > > Key: SPARK-24754 > URL: https://issues.apache.org/jira/browse/SPARK-24754 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.1.0 >Reporter: Jiayuan Ma >Assignee: Sean Owen >Priority: Minor > Fix For: 2.4.0 > > > Hash computation in MinHashLSHModel has integer overflow bug. > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala#L69 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24807) Adding files/jars twice: output a warning and add a note
[ https://issues.apache.org/jira/browse/SPARK-24807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24807: Assignee: Apache Spark > Adding files/jars twice: output a warning and add a note > > > Key: SPARK-24807 > URL: https://issues.apache.org/jira/browse/SPARK-24807 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Trivial > > In current version of Spark (2.3.x), one file/jar can be added only once. > Next additions of the same path are silently ignored. This behavoir is not > properly documented: > https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.SparkContext > This confuses our users and support teams in our company. The ticket aims to > output a warning which should clearly state that second addition of the same > path is not supported now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24807) Adding files/jars twice: output a warning and add a note
[ https://issues.apache.org/jira/browse/SPARK-24807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544236#comment-16544236 ] Apache Spark commented on SPARK-24807: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/21771 > Adding files/jars twice: output a warning and add a note > > > Key: SPARK-24807 > URL: https://issues.apache.org/jira/browse/SPARK-24807 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Trivial > > In current version of Spark (2.3.x), one file/jar can be added only once. > Next additions of the same path are silently ignored. This behavoir is not > properly documented: > https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.SparkContext > This confuses our users and support teams in our company. The ticket aims to > output a warning which should clearly state that second addition of the same > path is not supported now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24807) Adding files/jars twice: output a warning and add a note
[ https://issues.apache.org/jira/browse/SPARK-24807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24807: Assignee: (was: Apache Spark) > Adding files/jars twice: output a warning and add a note > > > Key: SPARK-24807 > URL: https://issues.apache.org/jira/browse/SPARK-24807 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Trivial > > In current version of Spark (2.3.x), one file/jar can be added only once. > Next additions of the same path are silently ignored. This behavoir is not > properly documented: > https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.SparkContext > This confuses our users and support teams in our company. The ticket aims to > output a warning which should clearly state that second addition of the same > path is not supported now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24807) Adding files/jars twice: output a warning and add a note
Maxim Gekk created SPARK-24807: -- Summary: Adding files/jars twice: output a warning and add a note Key: SPARK-24807 URL: https://issues.apache.org/jira/browse/SPARK-24807 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.1 Reporter: Maxim Gekk In current version of Spark (2.3.x), one file/jar can be added only once. Next additions of the same path are silently ignored. This behavoir is not properly documented: https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.SparkContext This confuses our users and support teams in our company. The ticket aims to output a warning which should clearly state that second addition of the same path is not supported now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24806) Brush up generated code so that JDK Java compilers can handle it
[ https://issues.apache.org/jira/browse/SPARK-24806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24806: Assignee: (was: Apache Spark) > Brush up generated code so that JDK Java compilers can handle it > > > Key: SPARK-24806 > URL: https://issues.apache.org/jira/browse/SPARK-24806 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Takeshi Yamamuro >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24806) Brush up generated code so that JDK Java compilers can handle it
[ https://issues.apache.org/jira/browse/SPARK-24806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544225#comment-16544225 ] Apache Spark commented on SPARK-24806: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/21770 > Brush up generated code so that JDK Java compilers can handle it > > > Key: SPARK-24806 > URL: https://issues.apache.org/jira/browse/SPARK-24806 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Takeshi Yamamuro >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24806) Brush up generated code so that JDK Java compilers can handle it
[ https://issues.apache.org/jira/browse/SPARK-24806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24806: Assignee: Apache Spark > Brush up generated code so that JDK Java compilers can handle it > > > Key: SPARK-24806 > URL: https://issues.apache.org/jira/browse/SPARK-24806 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Takeshi Yamamuro >Assignee: Apache Spark >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24806) Brush up generated code so that JDK Java compilers can handle it
Takeshi Yamamuro created SPARK-24806: Summary: Brush up generated code so that JDK Java compilers can handle it Key: SPARK-24806 URL: https://issues.apache.org/jira/browse/SPARK-24806 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.1 Reporter: Takeshi Yamamuro -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24805) Don't ignore files without .avro extension by default
[ https://issues.apache.org/jira/browse/SPARK-24805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24805: Assignee: Apache Spark > Don't ignore files without .avro extension by default > - > > Key: SPARK-24805 > URL: https://issues.apache.org/jira/browse/SPARK-24805 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Currently to read files without .avro extension, users have to set the flag > *avro.mapred.ignore.inputs.without.extension* to *false* (by default it is > *true*). The ticket aims to change the default value to *false*. The reasons > to do that are: > - Other systems can create avro files without extensions. When users try to > read such files, they get just partitial results silently. The behaviour may > confuse users. > - Current behavior is different behavior from another supported datasource > CSV and JSON. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24805) Don't ignore files without .avro extension by default
[ https://issues.apache.org/jira/browse/SPARK-24805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24805: Assignee: (was: Apache Spark) > Don't ignore files without .avro extension by default > - > > Key: SPARK-24805 > URL: https://issues.apache.org/jira/browse/SPARK-24805 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Major > > Currently to read files without .avro extension, users have to set the flag > *avro.mapred.ignore.inputs.without.extension* to *false* (by default it is > *true*). The ticket aims to change the default value to *false*. The reasons > to do that are: > - Other systems can create avro files without extensions. When users try to > read such files, they get just partitial results silently. The behaviour may > confuse users. > - Current behavior is different behavior from another supported datasource > CSV and JSON. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24805) Don't ignore files without .avro extension by default
[ https://issues.apache.org/jira/browse/SPARK-24805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544175#comment-16544175 ] Apache Spark commented on SPARK-24805: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/21769 > Don't ignore files without .avro extension by default > - > > Key: SPARK-24805 > URL: https://issues.apache.org/jira/browse/SPARK-24805 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Major > > Currently to read files without .avro extension, users have to set the flag > *avro.mapred.ignore.inputs.without.extension* to *false* (by default it is > *true*). The ticket aims to change the default value to *false*. The reasons > to do that are: > - Other systems can create avro files without extensions. When users try to > read such files, they get just partitial results silently. The behaviour may > confuse users. > - Current behavior is different behavior from another supported datasource > CSV and JSON. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24805) Don't ignore files without .avro extension by default
Maxim Gekk created SPARK-24805: -- Summary: Don't ignore files without .avro extension by default Key: SPARK-24805 URL: https://issues.apache.org/jira/browse/SPARK-24805 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.1 Reporter: Maxim Gekk Currently to read files without .avro extension, users have to set the flag *avro.mapred.ignore.inputs.without.extension* to *false* (by default it is *true*). The ticket aims to change the default value to *false*. The reasons to do that are: - Other systems can create avro files without extensions. When users try to read such files, they get just partitial results silently. The behaviour may confuse users. - Current behavior is different behavior from another supported datasource CSV and JSON. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23255) Add user guide and examples for DataFrame image reading functions
[ https://issues.apache.org/jira/browse/SPARK-23255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544159#comment-16544159 ] Divay Jindal commented on SPARK-23255: -- Hey can i take up this issue ? > Add user guide and examples for DataFrame image reading functions > - > > Key: SPARK-23255 > URL: https://issues.apache.org/jira/browse/SPARK-23255 > Project: Spark > Issue Type: Documentation > Components: ML, PySpark >Affects Versions: 2.3.0 >Reporter: Nick Pentreath >Priority: Minor > > SPARK-21866 added built-in support for reading image data into a DataFrame. > This new functionality should be documented in the user guide, with example > usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24619) DOCO - usage of authenticate.secret between masters/workers not clear
[ https://issues.apache.org/jira/browse/SPARK-24619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544151#comment-16544151 ] Divay Jindal commented on SPARK-24619: -- Can i take this issue ? > DOCO - usage of authenticate.secret between masters/workers not clear > - > > Key: SPARK-24619 > URL: https://issues.apache.org/jira/browse/SPARK-24619 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: t oo >Priority: Major > > The documentation is not clear on whether authenticate.secret can be passed > in as --conf to a spark-submit -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17091) Convert IN predicate to equivalent Parquet filter
[ https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-17091. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21603 [https://github.com/apache/spark/pull/21603] > Convert IN predicate to equivalent Parquet filter > - > > Key: SPARK-17091 > URL: https://issues.apache.org/jira/browse/SPARK-17091 > Project: Spark > Issue Type: Bug >Reporter: Andrew Duffy >Assignee: Yuming Wang >Priority: Major > Fix For: 2.4.0 > > Attachments: IN Predicate.png, OR Predicate.png > > > Past attempts at pushing down the InSet operation for Parquet relied on > user-defined predicates. It would be simpler to rewrite an IN clause into the > corresponding OR union of a set of equality conditions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17091) Convert IN predicate to equivalent Parquet filter
[ https://issues.apache.org/jira/browse/SPARK-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-17091: Assignee: Yuming Wang > Convert IN predicate to equivalent Parquet filter > - > > Key: SPARK-17091 > URL: https://issues.apache.org/jira/browse/SPARK-17091 > Project: Spark > Issue Type: Bug >Reporter: Andrew Duffy >Assignee: Yuming Wang >Priority: Major > Attachments: IN Predicate.png, OR Predicate.png > > > Past attempts at pushing down the InSet operation for Parquet relied on > user-defined predicates. It would be simpler to rewrite an IN clause into the > corresponding OR union of a set of equality conditions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24776) AVRO unit test: use SQLTestUtils and Replace deprecated methods
[ https://issues.apache.org/jira/browse/SPARK-24776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24776: Assignee: Gengliang Wang (was: Apache Spark) > AVRO unit test: use SQLTestUtils and Replace deprecated methods > --- > > Key: SPARK-24776 > URL: https://issues.apache.org/jira/browse/SPARK-24776 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 2.4.0 > > > * use SQLTestUtils > * Replace deprecated methods > * etc > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24776) AVRO unit test: use SQLTestUtils and Replace deprecated methods
[ https://issues.apache.org/jira/browse/SPARK-24776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24776: Assignee: Apache Spark (was: Gengliang Wang) > AVRO unit test: use SQLTestUtils and Replace deprecated methods > --- > > Key: SPARK-24776 > URL: https://issues.apache.org/jira/browse/SPARK-24776 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > Fix For: 2.4.0 > > > * use SQLTestUtils > * Replace deprecated methods > * etc > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24776) AVRO unit test: use SQLTestUtils and Replace deprecated methods
[ https://issues.apache.org/jira/browse/SPARK-24776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544131#comment-16544131 ] Apache Spark commented on SPARK-24776: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/21768 > AVRO unit test: use SQLTestUtils and Replace deprecated methods > --- > > Key: SPARK-24776 > URL: https://issues.apache.org/jira/browse/SPARK-24776 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 2.4.0 > > > * use SQLTestUtils > * Replace deprecated methods > * etc > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
[ https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24804: Assignee: Apache Spark > There are duplicate words in the title in the DatasetSuite > -- > > Key: SPARK-24804 > URL: https://issues.apache.org/jira/browse/SPARK-24804 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: hantiantian >Assignee: Apache Spark >Priority: Minor > Fix For: 2.3.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
[ https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544101#comment-16544101 ] Apache Spark commented on SPARK-24804: -- User 'httfighter' has created a pull request for this issue: https://github.com/apache/spark/pull/21767 > There are duplicate words in the title in the DatasetSuite > -- > > Key: SPARK-24804 > URL: https://issues.apache.org/jira/browse/SPARK-24804 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: hantiantian >Priority: Minor > Fix For: 2.3.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
[ https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24804: Assignee: (was: Apache Spark) > There are duplicate words in the title in the DatasetSuite > -- > > Key: SPARK-24804 > URL: https://issues.apache.org/jira/browse/SPARK-24804 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: hantiantian >Priority: Minor > Fix For: 2.3.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24804) There are duplicate words in the title in the DatasetSuite
hantiantian created SPARK-24804: --- Summary: There are duplicate words in the title in the DatasetSuite Key: SPARK-24804 URL: https://issues.apache.org/jira/browse/SPARK-24804 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.2 Reporter: hantiantian Fix For: 2.3.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24803) add support for numeric
[ https://issues.apache.org/jira/browse/SPARK-24803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangtao93 updated SPARK-24803: -- Priority: Major (was: Minor) > add support for numeric > --- > > Key: SPARK-24803 > URL: https://issues.apache.org/jira/browse/SPARK-24803 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: wangtao93 >Priority: Major > > numerical is as same with decimal. spark has already supported decimal,so i > think we should add support for numeric to align SQL standards. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org