[jira] [Assigned] (SPARK-22892) Simplify some estimation logic by using double instead of decimal
[ https://issues.apache.org/jira/browse/SPARK-22892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22892: --- Assignee: Zhenhua Wang > Simplify some estimation logic by using double instead of decimal > - > > Key: SPARK-22892 > URL: https://issues.apache.org/jira/browse/SPARK-22892 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang >Priority: Minor > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22892) Simplify some estimation logic by using double instead of decimal
[ https://issues.apache.org/jira/browse/SPARK-22892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22892. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20062 [https://github.com/apache/spark/pull/20062] > Simplify some estimation logic by using double instead of decimal > - > > Key: SPARK-22892 > URL: https://issues.apache.org/jira/browse/SPARK-22892 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang >Priority: Minor > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22834) Make insert commands have real children to fix UI issues
[ https://issues.apache.org/jira/browse/SPARK-22834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22834: --- Assignee: Gengliang Wang > Make insert commands have real children to fix UI issues > > > Key: SPARK-22834 > URL: https://issues.apache.org/jira/browse/SPARK-22834 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang > Fix For: 2.3.0 > > > With https://github.com/apache/spark/pull/19474, children of insert commands > in UI are missing. To fix it, create a new physical plan > `DataWritingCommandExec` to exec `DataWritingCommand` with children. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22834) Make insert commands have real children to fix UI issues
[ https://issues.apache.org/jira/browse/SPARK-22834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22834. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20020 [https://github.com/apache/spark/pull/20020] > Make insert commands have real children to fix UI issues > > > Key: SPARK-22834 > URL: https://issues.apache.org/jira/browse/SPARK-22834 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: Gengliang Wang > Fix For: 2.3.0 > > > With https://github.com/apache/spark/pull/19474, children of insert commands > in UI are missing. To fix it, create a new physical plan > `DataWritingCommandExec` to exec `DataWritingCommand` with children. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-22891. - Resolution: Fixed Assignee: Feng Liu Fix Version/s: 2.3.0 > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Assignee: Feng Liu >Priority: Minor > Fix For: 2.3.0 > > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) > at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206) > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765) > ... 37 more > {code} > Also, i use apache hive 2.1.1 in my cluster. > When i use sp
[jira] [Assigned] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow
[ https://issues.apache.org/jira/browse/SPARK-22530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22530: Assignee: Apache Spark > Add ArrayType Support for working with Pandas and Arrow > --- > > Key: SPARK-22530 > URL: https://issues.apache.org/jira/browse/SPARK-22530 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler >Assignee: Apache Spark > > Adding ArrayType support for {{toPandas()}}, {{createDataFrame}} from Pandas, > and {{pandas_udf}}. I believe, arrays are already supported on the > Java/Scala side, so just need to complete this for Python. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow
[ https://issues.apache.org/jira/browse/SPARK-22530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22530: Assignee: (was: Apache Spark) > Add ArrayType Support for working with Pandas and Arrow > --- > > Key: SPARK-22530 > URL: https://issues.apache.org/jira/browse/SPARK-22530 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > Adding ArrayType support for {{toPandas()}}, {{createDataFrame}} from Pandas, > and {{pandas_udf}}. I believe, arrays are already supported on the > Java/Scala side, so just need to complete this for Python. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow
[ https://issues.apache.org/jira/browse/SPARK-22530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306023#comment-16306023 ] Apache Spark commented on SPARK-22530: -- User 'BryanCutler' has created a pull request for this issue: https://github.com/apache/spark/pull/20114 > Add ArrayType Support for working with Pandas and Arrow > --- > > Key: SPARK-22530 > URL: https://issues.apache.org/jira/browse/SPARK-22530 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.3.0 >Reporter: Bryan Cutler > > Adding ArrayType support for {{toPandas()}}, {{createDataFrame}} from Pandas, > and {{pandas_udf}}. I believe, arrays are already supported on the > Java/Scala side, so just need to complete this for Python. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel save implementation
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306020#comment-16306020 ] zhengruifeng commented on SPARK-22905: -- [~WeichenXu123] I made a check and found that same issue exists in {{GaussianMixtureModel}}, otherwise looks fine. > Fix ChiSqSelectorModel save implementation > -- > > Key: SPARK-22905 > URL: https://issues.apache.org/jira/browse/SPARK-22905 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Weichen Xu >Assignee: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, in `ChiSqSelectorModel`, save: > {code} > spark.createDataFrame(dataArray).repartition(1).write... > {code} > The default partition number used by createDataFrame is "defaultParallelism", > Current RoundRobinPartitioning won't guarantee the "repartition" generating > the same order result with local array. We need fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel save implementation
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306018#comment-16306018 ] Apache Spark commented on SPARK-22905: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/20113 > Fix ChiSqSelectorModel save implementation > -- > > Key: SPARK-22905 > URL: https://issues.apache.org/jira/browse/SPARK-22905 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Weichen Xu >Assignee: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, in `ChiSqSelectorModel`, save: > {code} > spark.createDataFrame(dataArray).repartition(1).write... > {code} > The default partition number used by createDataFrame is "defaultParallelism", > Current RoundRobinPartitioning won't guarantee the "repartition" generating > the same order result with local array. We need fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22883) ML test for StructuredStreaming: spark.ml.feature, A-M
[ https://issues.apache.org/jira/browse/SPARK-22883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306014#comment-16306014 ] Joseph K. Bradley commented on SPARK-22883: --- https://github.com/apache/spark/pull/20111 is part 1 of 2 for this JIRA. > ML test for StructuredStreaming: spark.ml.feature, A-M > -- > > Key: SPARK-22883 > URL: https://issues.apache.org/jira/browse/SPARK-22883 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley > > *For featurizers with names from A - M* > Task for adding Structured Streaming tests for all Models/Transformers in a > sub-module in spark.ml > For an example, see LinearRegressionSuite.scala in > https://github.com/apache/spark/pull/19843 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22734) Create Python API for VectorSizeHint
[ https://issues.apache.org/jira/browse/SPARK-22734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305978#comment-16305978 ] Apache Spark commented on SPARK-22734: -- User 'MrBago' has created a pull request for this issue: https://github.com/apache/spark/pull/20112 > Create Python API for VectorSizeHint > > > Key: SPARK-22734 > URL: https://issues.apache.org/jira/browse/SPARK-22734 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.2.0 >Reporter: Bago Amirbekian > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22722) Test Coverage for Type Coercion Compatibility
[ https://issues.apache.org/jira/browse/SPARK-22722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305979#comment-16305979 ] Yuming Wang commented on SPARK-22722: - [~smilegator] All tests are added, except [FunctionArgumentConversion|https://github.com/apache/spark/pull/20008#issuecomment-352670852] and [StackCoercion|https://github.com/apache/spark/pull/20006#pullrequestreview-84366891]. > Test Coverage for Type Coercion Compatibility > - > > Key: SPARK-22722 > URL: https://issues.apache.org/jira/browse/SPARK-22722 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Yuming Wang > > Hive compatibility is pretty important for the users who run or migrate both > Hive and Spark SQL. > We plan to add a SQLConf for type coercion compatibility > (spark.sql.typeCoercion.mode). Users can choose Spark's native mode (default) > or Hive mode (hive). > Before we deliver the Hive compatibility mode, we plan to write a set of test > cases that can be easily run in both Spark and Hive sides. We can easily > compare whether they are the same or not. When new typeCoercion rules are > added, we also can easily track the changes. These test cases can also be > backported to the previous Spark versions for determining the changes we > made. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22734) Create Python API for VectorSizeHint
[ https://issues.apache.org/jira/browse/SPARK-22734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22734: Assignee: (was: Apache Spark) > Create Python API for VectorSizeHint > > > Key: SPARK-22734 > URL: https://issues.apache.org/jira/browse/SPARK-22734 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.2.0 >Reporter: Bago Amirbekian > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22734) Create Python API for VectorSizeHint
[ https://issues.apache.org/jira/browse/SPARK-22734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22734: Assignee: Apache Spark > Create Python API for VectorSizeHint > > > Key: SPARK-22734 > URL: https://issues.apache.org/jira/browse/SPARK-22734 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.2.0 >Reporter: Bago Amirbekian >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22922) Python API for fitMultiple
[ https://issues.apache.org/jira/browse/SPARK-22922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305971#comment-16305971 ] Apache Spark commented on SPARK-22922: -- User 'MrBago' has created a pull request for this issue: https://github.com/apache/spark/pull/20058 > Python API for fitMultiple > -- > > Key: SPARK-22922 > URL: https://issues.apache.org/jira/browse/SPARK-22922 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.2.0 >Reporter: Bago Amirbekian > > Implement fitMultiple method on Estimator for pyspark. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22922) Python API for fitMultiple
[ https://issues.apache.org/jira/browse/SPARK-22922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22922: Assignee: Apache Spark > Python API for fitMultiple > -- > > Key: SPARK-22922 > URL: https://issues.apache.org/jira/browse/SPARK-22922 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.2.0 >Reporter: Bago Amirbekian >Assignee: Apache Spark > > Implement fitMultiple method on Estimator for pyspark. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22922) Python API for fitMultiple
[ https://issues.apache.org/jira/browse/SPARK-22922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22922: Assignee: (was: Apache Spark) > Python API for fitMultiple > -- > > Key: SPARK-22922 > URL: https://issues.apache.org/jira/browse/SPARK-22922 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 2.2.0 >Reporter: Bago Amirbekian > > Implement fitMultiple method on Estimator for pyspark. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22922) Python API for fitMultiple
Bago Amirbekian created SPARK-22922: --- Summary: Python API for fitMultiple Key: SPARK-22922 URL: https://issues.apache.org/jira/browse/SPARK-22922 Project: Spark Issue Type: Improvement Components: ML, PySpark Affects Versions: 2.2.0 Reporter: Bago Amirbekian Implement fitMultiple method on Estimator for pyspark. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22883) ML test for StructuredStreaming: spark.ml.feature, A-M
[ https://issues.apache.org/jira/browse/SPARK-22883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305948#comment-16305948 ] Apache Spark commented on SPARK-22883: -- User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/20111 > ML test for StructuredStreaming: spark.ml.feature, A-M > -- > > Key: SPARK-22883 > URL: https://issues.apache.org/jira/browse/SPARK-22883 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley > > *For featurizers with names from A - M* > Task for adding Structured Streaming tests for all Models/Transformers in a > sub-module in spark.ml > For an example, see LinearRegressionSuite.scala in > https://github.com/apache/spark/pull/19843 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22883) ML test for StructuredStreaming: spark.ml.feature, A-M
[ https://issues.apache.org/jira/browse/SPARK-22883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22883: Assignee: Apache Spark > ML test for StructuredStreaming: spark.ml.feature, A-M > -- > > Key: SPARK-22883 > URL: https://issues.apache.org/jira/browse/SPARK-22883 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > *For featurizers with names from A - M* > Task for adding Structured Streaming tests for all Models/Transformers in a > sub-module in spark.ml > For an example, see LinearRegressionSuite.scala in > https://github.com/apache/spark/pull/19843 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22883) ML test for StructuredStreaming: spark.ml.feature, A-M
[ https://issues.apache.org/jira/browse/SPARK-22883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22883: Assignee: (was: Apache Spark) > ML test for StructuredStreaming: spark.ml.feature, A-M > -- > > Key: SPARK-22883 > URL: https://issues.apache.org/jira/browse/SPARK-22883 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley > > *For featurizers with names from A - M* > Task for adding Structured Streaming tests for all Models/Transformers in a > sub-module in spark.ml > For an example, see LinearRegressionSuite.scala in > https://github.com/apache/spark/pull/19843 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel save implementation
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305931#comment-16305931 ] Weichen Xu commented on SPARK-22905: [~podongfeng] Some of them only including one row to save so there's no bug, some case including row-number column and when reading it will sort to get stable order. But I am not sure I miss some cases, it will great if you help check. > Fix ChiSqSelectorModel save implementation > -- > > Key: SPARK-22905 > URL: https://issues.apache.org/jira/browse/SPARK-22905 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Weichen Xu >Assignee: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, in `ChiSqSelectorModel`, save: > {code} > spark.createDataFrame(dataArray).repartition(1).write... > {code} > The default partition number used by createDataFrame is "defaultParallelism", > Current RoundRobinPartitioning won't guarantee the "repartition" generating > the same order result with local array. We need fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22313) Mark/print deprecation warnings as DeprecationWarning for deprecated APIs
[ https://issues.apache.org/jira/browse/SPARK-22313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305917#comment-16305917 ] Apache Spark commented on SPARK-22313: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/20110 > Mark/print deprecation warnings as DeprecationWarning for deprecated APIs > - > > Key: SPARK-22313 > URL: https://issues.apache.org/jira/browse/SPARK-22313 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.3.0 > > > Currently, some {{warnings.warn(...)}} for deprecation use the category > {{UserWarning}} as by default. > If we use {{DeprecationWarning}}, this can actually be useful in IDE, in my > case, PyCharm. Please see before and after in the PR. I happened to open a PR > first to show my idea. > Also, looks some deprecated functions do not have this warnings. It might be > better to print out those explicitly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22905) Fix ChiSqSelectorModel save implementation
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305905#comment-16305905 ] zhengruifeng edited comment on SPARK-22905 at 12/29/17 2:08 AM: Many other models are saved in the same way {{sparkSession.createDataFrame(...).repartition(1).write.parquet}}, are they needed to be fixed? was (Author: podongfeng): Many other models are saved in the same way {sparkSession.createDataFrame(...).repartition(1).write.parquet}, are they needed to be fixed? > Fix ChiSqSelectorModel save implementation > -- > > Key: SPARK-22905 > URL: https://issues.apache.org/jira/browse/SPARK-22905 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Weichen Xu >Assignee: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, in `ChiSqSelectorModel`, save: > {code} > spark.createDataFrame(dataArray).repartition(1).write... > {code} > The default partition number used by createDataFrame is "defaultParallelism", > Current RoundRobinPartitioning won't guarantee the "repartition" generating > the same order result with local array. We need fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel save implementation
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305905#comment-16305905 ] zhengruifeng commented on SPARK-22905: -- Many other models are saved in the same way {sparkSession.createDataFrame(...).repartition(1).write.parquet}, are they needed to be fixed? > Fix ChiSqSelectorModel save implementation > -- > > Key: SPARK-22905 > URL: https://issues.apache.org/jira/browse/SPARK-22905 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Weichen Xu >Assignee: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, in `ChiSqSelectorModel`, save: > {code} > spark.createDataFrame(dataArray).repartition(1).write... > {code} > The default partition number used by createDataFrame is "defaultParallelism", > Current RoundRobinPartitioning won't guarantee the "repartition" generating > the same order result with local array. We need fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22905) Fix ChiSqSelectorModel save implementation
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-22905. --- Resolution: Fixed Fix Version/s: 2.3.0 Resolved by https://github.com/apache/spark/pull/20088 > Fix ChiSqSelectorModel save implementation > -- > > Key: SPARK-22905 > URL: https://issues.apache.org/jira/browse/SPARK-22905 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Weichen Xu >Assignee: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, in `ChiSqSelectorModel`, save: > {code} > spark.createDataFrame(dataArray).repartition(1).write... > {code} > The default partition number used by createDataFrame is "defaultParallelism", > Current RoundRobinPartitioning won't guarantee the "repartition" generating > the same order result with local array. We need fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305881#comment-16305881 ] Feng Liu commented on SPARK-22891: -- A side note: if we don't want to merge https://github.com/apache/spark/pull/20029, we should make the creation of hive client lazy inside the HiveSessionResourceLoader: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L123 as we know the hive client creation is expensive, so it does not make sense to materialize it if we don't use it. > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
[jira] [Commented] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305874#comment-16305874 ] Apache Spark commented on SPARK-22891: -- User 'liufengdb' has created a pull request for this issue: https://github.com/apache/spark/pull/20109 > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) > at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206) > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765) > ... 37 more > {code} > Also, i use apache hive 2.1.1 in
[jira] [Assigned] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22891: Assignee: (was: Apache Spark) > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) > at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206) > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765) > ... 37 more > {code} > Also, i use apache hive 2.1.1 in my cluster. > When i use spark 2.1.x, the exception above never happends again. -- This message was sent by Atla
[jira] [Assigned] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22891: Assignee: Apache Spark > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Assignee: Apache Spark >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) > at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206) > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765) > ... 37 more > {code} > Also, i use apache hive 2.1.1 in my cluster. > When i use spark 2.1.x, the exception above never happends again. -- This
[jira] [Comment Edited] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305862#comment-16305862 ] Feng Liu edited comment on SPARK-22891 at 12/29/17 12:56 AM: - This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` creation (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides this change, I think we should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 was (Author: liufeng...@gmail.com): This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` creation (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides this change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.
[jira] [Comment Edited] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305862#comment-16305862 ] Feng Liu edited comment on SPARK-22891 at 12/29/17 12:49 AM: - This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` creation (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides this change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 was (Author: liufeng...@gmail.com): This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionSt
[jira] [Commented] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305862#comment-16305862 ] Feng Liu commented on SPARK-22891: -- This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by: ja
[jira] [Updated] (SPARK-14922) Alter Table Drop Partition Using Predicate-based Partition Spec
[ https://issues.apache.org/jira/browse/SPARK-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-14922: -- Affects Version/s: 2.1.2 2.2.1 > Alter Table Drop Partition Using Predicate-based Partition Spec > --- > > Key: SPARK-14922 > URL: https://issues.apache.org/jira/browse/SPARK-14922 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.2, 2.2.1 >Reporter: Xiao Li > > Below is allowed in Hive, but not allowed in Spark. > {noformat} > alter table ptestfilter drop partition (c='US', d<'2') > {noformat} > This example is copied from drop_partitions_filter.q -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22818) csv escape of quote escape
[ https://issues.apache.org/jira/browse/SPARK-22818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-22818. - Resolution: Fixed Assignee: Soonmok Kwon Fix Version/s: 2.3.0 > csv escape of quote escape > -- > > Key: SPARK-22818 > URL: https://issues.apache.org/jira/browse/SPARK-22818 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: Soonmok Kwon >Assignee: Soonmok Kwon >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > A DataFrame is stored in CSV format and loaded again. When there's backslash > followed by quotation mark, csv reading seems to make an error. > This issue was raised before in > https://issues.apache.org/jira/browse/SPARK-19834 and postponed due to a bug > in its dependency. Now it is resolved and this issue can be reopened. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22921) Merge script should prompt for assigning jiras
[ https://issues.apache.org/jira/browse/SPARK-22921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305814#comment-16305814 ] Apache Spark commented on SPARK-22921: -- User 'squito' has created a pull request for this issue: https://github.com/apache/spark/pull/20107 > Merge script should prompt for assigning jiras > -- > > Key: SPARK-22921 > URL: https://issues.apache.org/jira/browse/SPARK-22921 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 2.3.0 >Reporter: Imran Rashid >Priority: Trivial > > Its a bit of a nuisance to have to go into jira to assign the issue when you > merge a pr. In general you assign to either the original reporter or a > commentor, would be nice if the merge script made that easy to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22921) Merge script should prompt for assigning jiras
[ https://issues.apache.org/jira/browse/SPARK-22921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22921: Assignee: Apache Spark > Merge script should prompt for assigning jiras > -- > > Key: SPARK-22921 > URL: https://issues.apache.org/jira/browse/SPARK-22921 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 2.3.0 >Reporter: Imran Rashid >Assignee: Apache Spark >Priority: Trivial > > Its a bit of a nuisance to have to go into jira to assign the issue when you > merge a pr. In general you assign to either the original reporter or a > commentor, would be nice if the merge script made that easy to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22921) Merge script should prompt for assigning jiras
[ https://issues.apache.org/jira/browse/SPARK-22921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22921: Assignee: (was: Apache Spark) > Merge script should prompt for assigning jiras > -- > > Key: SPARK-22921 > URL: https://issues.apache.org/jira/browse/SPARK-22921 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 2.3.0 >Reporter: Imran Rashid >Priority: Trivial > > Its a bit of a nuisance to have to go into jira to assign the issue when you > merge a pr. In general you assign to either the original reporter or a > commentor, would be nice if the merge script made that easy to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22904) Basic tests for decimal operations and string cast
[ https://issues.apache.org/jira/browse/SPARK-22904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-22904. - Resolution: Fixed Assignee: Marco Gaido Fix Version/s: 2.3.0 > Basic tests for decimal operations and string cast > -- > > Key: SPARK-22904 > URL: https://issues.apache.org/jira/browse/SPARK-22904 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Assignee: Marco Gaido > Fix For: 2.3.0 > > > Tests covering Spark behavior with decimal operations which cause overflow or > precision loss and covering casting invalid strings to other data types or > passing invalid strings to some functions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11035) Launcher: allow apps to be launched in-process
[ https://issues.apache.org/jira/browse/SPARK-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-11035. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19591 [https://github.com/apache/spark/pull/19591] > Launcher: allow apps to be launched in-process > -- > > Key: SPARK-11035 > URL: https://issues.apache.org/jira/browse/SPARK-11035 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.3.0 > > > The launcher library is currently restricted to launching apps as child > processes. That is fine for a lot of cases, especially if the app is running > in client mode. > But in certain cases, especially launching in cluster mode, it's more > efficient to avoid launching a new process, since that process won't be doing > much. > We should add support for launching apps in process, even if restricted to > cluster mode at first. This will require some rework of the launch paths to > avoid using system properties to propagate configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22890) Basic tests for DateTimeOperations
[ https://issues.apache.org/jira/browse/SPARK-22890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-22890. - Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 2.3.0 > Basic tests for DateTimeOperations > -- > > Key: SPARK-22890 > URL: https://issues.apache.org/jira/browse/SPARK-22890 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22921) Merge script should prompt for assigning jiras
Imran Rashid created SPARK-22921: Summary: Merge script should prompt for assigning jiras Key: SPARK-22921 URL: https://issues.apache.org/jira/browse/SPARK-22921 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 2.3.0 Reporter: Imran Rashid Priority: Trivial Its a bit of a nuisance to have to go into jira to assign the issue when you merge a pr. In general you assign to either the original reporter or a commentor, would be nice if the merge script made that easy to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11035) Launcher: allow apps to be launched in-process
[ https://issues.apache.org/jira/browse/SPARK-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-11035: Assignee: Marcelo Vanzin > Launcher: allow apps to be launched in-process > -- > > Key: SPARK-11035 > URL: https://issues.apache.org/jira/browse/SPARK-11035 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > The launcher library is currently restricted to launching apps as child > processes. That is fine for a lot of cases, especially if the app is running > in client mode. > But in certain cases, especially launching in cluster mode, it's more > efficient to avoid launching a new process, since that process won't be doing > much. > We should add support for launching apps in process, even if restricted to > cluster mode at first. This will require some rework of the launch paths to > avoid using system properties to propagate configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11035) Launcher: allow apps to be launched in-process
[ https://issues.apache.org/jira/browse/SPARK-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-11035: Assignee: (was: Marcelo Vanzin) > Launcher: allow apps to be launched in-process > -- > > Key: SPARK-11035 > URL: https://issues.apache.org/jira/browse/SPARK-11035 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin > > The launcher library is currently restricted to launching apps as child > processes. That is fine for a lot of cases, especially if the app is running > in client mode. > But in certain cases, especially launching in cluster mode, it's more > efficient to avoid launching a new process, since that process won't be doing > much. > We should add support for launching apps in process, even if restricted to > cluster mode at first. This will require some rework of the launch paths to > avoid using system properties to propagate configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11035) Launcher: allow apps to be launched in-process
[ https://issues.apache.org/jira/browse/SPARK-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-11035: Assignee: Marcelo Vanzin > Launcher: allow apps to be launched in-process > -- > > Key: SPARK-11035 > URL: https://issues.apache.org/jira/browse/SPARK-11035 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > The launcher library is currently restricted to launching apps as child > processes. That is fine for a lot of cases, especially if the app is running > in client mode. > But in certain cases, especially launching in cluster mode, it's more > efficient to avoid launching a new process, since that process won't be doing > much. > We should add support for launching apps in process, even if restricted to > cluster mode at first. This will require some rework of the launch paths to > avoid using system properties to propagate configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-12297: Assignee: Imran Rashid (was: Marcelo Vanzin) > Add work-around for Parquet/Hive int96 timestamp bug. > - > > Key: SPARK-12297 > URL: https://issues.apache.org/jira/browse/SPARK-12297 > Project: Spark > Issue Type: Task > Components: Spark Core >Reporter: Ryan Blue >Assignee: Imran Rashid > Fix For: 2.3.0 > > > Spark copied Hive's behavior for parquet, but this was inconsistent with > other file formats, and inconsistent with Impala (which is the original > source of putting a timestamp as an int96 in parquet, I believe). This made > timestamps in parquet act more like timestamps with timezones, while in other > file formats, timestamps have no time zone, they are a "floating time". > The easiest way to see this issue is to write out a table with timestamps in > multiple different formats from one timezone, then try to read them back in > another timezone. Eg., here I write out a few timestamps to parquet and > textfile hive tables, and also just as a json file, all in the > "America/Los_Angeles" timezone: > {code} > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > val tblPrefix = args(0) > val schema = new StructType().add("ts", TimestampType) > val rows = sc.parallelize(Seq( > "2015-12-31 23:50:59.123", > "2015-12-31 22:49:59.123", > "2016-01-01 00:39:59.123", > "2016-01-01 01:29:59.123" > ).map { x => Row(java.sql.Timestamp.valueOf(x)) }) > val rawData = spark.createDataFrame(rows, schema).toDF() > rawData.show() > Seq("parquet", "textfile").foreach { format => > val tblName = s"${tblPrefix}_$format" > spark.sql(s"DROP TABLE IF EXISTS $tblName") > spark.sql( > raw"""CREATE TABLE $tblName ( > | ts timestamp > | ) > | STORED AS $format > """.stripMargin) > rawData.write.insertInto(tblName) > } > rawData.write.json(s"${tblPrefix}_json") > {code} > Then I start a spark-shell in "America/New_York" timezone, and read the data > back from each table: > {code} > scala> spark.sql("select * from la_parquet").collect().foreach{println} > [2016-01-01 02:50:59.123] > [2016-01-01 01:49:59.123] > [2016-01-01 03:39:59.123] > [2016-01-01 04:29:59.123] > scala> spark.sql("select * from la_textfile").collect().foreach{println} > [2015-12-31 23:50:59.123] > [2015-12-31 22:49:59.123] > [2016-01-01 00:39:59.123] > [2016-01-01 01:29:59.123] > scala> spark.read.json("la_json").collect().foreach{println} > [2015-12-31 23:50:59.123] > [2015-12-31 22:49:59.123] > [2016-01-01 00:39:59.123] > [2016-01-01 01:29:59.123] > scala> spark.read.json("la_json").join(spark.sql("select * from > la_textfile"), "ts").show() > ++ > | ts| > ++ > |2015-12-31 23:50:...| > |2015-12-31 22:49:...| > |2016-01-01 00:39:...| > |2016-01-01 01:29:...| > ++ > scala> spark.read.json("la_json").join(spark.sql("select * from la_parquet"), > "ts").show() > +---+ > | ts| > +---+ > +---+ > {code} > The textfile and json based data shows the same times, and can be joined > against each other, while the times from the parquet data have changed (and > obviously joins fail). > This is a big problem for any organization that may try to read the same data > (say in S3) with clusters in multiple timezones. It can also be a nasty > surprise as an organization tries to migrate file formats. Finally, its a > source of incompatibility between Hive, Impala, and Spark. > HIVE-12767 aims to fix this by introducing a table property which indicates > the "storage timezone" for the table. Spark should add the same to ensure > consistency between file formats, and with Hive & Impala. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21616: Assignee: Apache Spark (was: Felix Cheung) > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Apache Spark > > From looking at changes since 2.2.0, this/these should be documented in the > migration guide / release note for the 2.3.0 release, as it is behavior > changes -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21616: Assignee: Felix Cheung (was: Apache Spark) > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > > From looking at changes since 2.2.0, this/these should be documented in the > migration guide / release note for the 2.3.0 release, as it is behavior > changes -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-12297: Assignee: Marcelo Vanzin (was: Imran Rashid) > Add work-around for Parquet/Hive int96 timestamp bug. > - > > Key: SPARK-12297 > URL: https://issues.apache.org/jira/browse/SPARK-12297 > Project: Spark > Issue Type: Task > Components: Spark Core >Reporter: Ryan Blue >Assignee: Marcelo Vanzin > Fix For: 2.3.0 > > > Spark copied Hive's behavior for parquet, but this was inconsistent with > other file formats, and inconsistent with Impala (which is the original > source of putting a timestamp as an int96 in parquet, I believe). This made > timestamps in parquet act more like timestamps with timezones, while in other > file formats, timestamps have no time zone, they are a "floating time". > The easiest way to see this issue is to write out a table with timestamps in > multiple different formats from one timezone, then try to read them back in > another timezone. Eg., here I write out a few timestamps to parquet and > textfile hive tables, and also just as a json file, all in the > "America/Los_Angeles" timezone: > {code} > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > val tblPrefix = args(0) > val schema = new StructType().add("ts", TimestampType) > val rows = sc.parallelize(Seq( > "2015-12-31 23:50:59.123", > "2015-12-31 22:49:59.123", > "2016-01-01 00:39:59.123", > "2016-01-01 01:29:59.123" > ).map { x => Row(java.sql.Timestamp.valueOf(x)) }) > val rawData = spark.createDataFrame(rows, schema).toDF() > rawData.show() > Seq("parquet", "textfile").foreach { format => > val tblName = s"${tblPrefix}_$format" > spark.sql(s"DROP TABLE IF EXISTS $tblName") > spark.sql( > raw"""CREATE TABLE $tblName ( > | ts timestamp > | ) > | STORED AS $format > """.stripMargin) > rawData.write.insertInto(tblName) > } > rawData.write.json(s"${tblPrefix}_json") > {code} > Then I start a spark-shell in "America/New_York" timezone, and read the data > back from each table: > {code} > scala> spark.sql("select * from la_parquet").collect().foreach{println} > [2016-01-01 02:50:59.123] > [2016-01-01 01:49:59.123] > [2016-01-01 03:39:59.123] > [2016-01-01 04:29:59.123] > scala> spark.sql("select * from la_textfile").collect().foreach{println} > [2015-12-31 23:50:59.123] > [2015-12-31 22:49:59.123] > [2016-01-01 00:39:59.123] > [2016-01-01 01:29:59.123] > scala> spark.read.json("la_json").collect().foreach{println} > [2015-12-31 23:50:59.123] > [2015-12-31 22:49:59.123] > [2016-01-01 00:39:59.123] > [2016-01-01 01:29:59.123] > scala> spark.read.json("la_json").join(spark.sql("select * from > la_textfile"), "ts").show() > ++ > | ts| > ++ > |2015-12-31 23:50:...| > |2015-12-31 22:49:...| > |2016-01-01 00:39:...| > |2016-01-01 01:29:...| > ++ > scala> spark.read.json("la_json").join(spark.sql("select * from la_parquet"), > "ts").show() > +---+ > | ts| > +---+ > +---+ > {code} > The textfile and json based data shows the same times, and can be joined > against each other, while the times from the parquet data have changed (and > obviously joins fail). > This is a big problem for any organization that may try to read the same data > (say in S3) with clusters in multiple timezones. It can also be a nasty > surprise as an organization tries to migrate file formats. Finally, its a > source of incompatibility between Hive, Impala, and Spark. > HIVE-12767 aims to fix this by introducing a table property which indicates > the "storage timezone" for the table. Spark should add the same to ensure > consistency between file formats, and with Hive & Impala. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305789#comment-16305789 ] Apache Spark commented on SPARK-21616: -- User 'felixcheung' has created a pull request for this issue: https://github.com/apache/spark/pull/20106 > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > > From looking at changes since 2.2.0, this/these should be documented in the > migration guide / release note for the 2.3.0 release, as it is behavior > changes -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22920) R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString
[ https://issues.apache.org/jira/browse/SPARK-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305762#comment-16305762 ] Apache Spark commented on SPARK-22920: -- User 'felixcheung' has created a pull request for this issue: https://github.com/apache/spark/pull/20105 > R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with > trimString > - > > Key: SPARK-22920 > URL: https://issues.apache.org/jira/browse/SPARK-22920 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22920) R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString
[ https://issues.apache.org/jira/browse/SPARK-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22920: Assignee: Apache Spark > R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with > trimString > - > > Key: SPARK-22920 > URL: https://issues.apache.org/jira/browse/SPARK-22920 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22920) R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString
[ https://issues.apache.org/jira/browse/SPARK-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22920: Assignee: (was: Apache Spark) > R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with > trimString > - > > Key: SPARK-22920 > URL: https://issues.apache.org/jira/browse/SPARK-22920 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22920) R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString
Felix Cheung created SPARK-22920: Summary: R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString Key: SPARK-22920 URL: https://issues.apache.org/jira/browse/SPARK-22920 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 2.3.0 Reporter: Felix Cheung -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22836) Executors page is not showing driver logs links
[ https://issues.apache.org/jira/browse/SPARK-22836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-22836. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20038 [https://github.com/apache/spark/pull/20038] > Executors page is not showing driver logs links > --- > > Key: SPARK-22836 > URL: https://issues.apache.org/jira/browse/SPARK-22836 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin > Fix For: 2.3.0 > > > -I think this was mainly caused by SPARK-15951; that changed modified the > executors page to read data from the REST API, and in 2.1 and 2.2 the REST > API does not return the driver as an executor. So no information about the > driver is shown in that page at all.- (Edit: bug is unrelated to the > aforementioned bug.) > In 2.3 the driver executor is listed, but it is doesn't have any log links. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22836) Executors page is not showing driver logs links
[ https://issues.apache.org/jira/browse/SPARK-22836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-22836: Assignee: Marcelo Vanzin > Executors page is not showing driver logs links > --- > > Key: SPARK-22836 > URL: https://issues.apache.org/jira/browse/SPARK-22836 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.3.0 > > > -I think this was mainly caused by SPARK-15951; that changed modified the > executors page to read data from the REST API, and in 2.1 and 2.2 the REST > API does not return the driver as an executor. So no information about the > driver is shown in that page at all.- (Edit: bug is unrelated to the > aforementioned bug.) > In 2.3 the driver executor is listed, but it is doesn't have any log links. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22905) Fix ChiSqSelectorModel save implementation
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley reassigned SPARK-22905: - Assignee: Weichen Xu > Fix ChiSqSelectorModel save implementation > -- > > Key: SPARK-22905 > URL: https://issues.apache.org/jira/browse/SPARK-22905 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Weichen Xu >Assignee: Weichen Xu > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, in `ChiSqSelectorModel`, save: > {code} > spark.createDataFrame(dataArray).repartition(1).write... > {code} > The default partition number used by createDataFrame is "defaultParallelism", > Current RoundRobinPartitioning won't guarantee the "repartition" generating > the same order result with local array. We need fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)
[ https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305691#comment-16305691 ] Dong Jiang commented on SPARK-13127: [~gaurav24], looks like you are like me, waiting for this ticket to be worked on. If you would like, help to comment on this thread in developer list to advocate to have this issue resolved in Spark 2.3 release http://apache-spark-developers-list.1001551.n3.nabble.com/Timeline-for-Spark-2-3-td22793.html > Upgrade Parquet to 1.9 (Fixes parquet sorting) > -- > > Key: SPARK-13127 > URL: https://issues.apache.org/jira/browse/SPARK-13127 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Justin Pihony > > Currently, when you write a sorted DataFrame to Parquet, then reading the > data back out is not sorted by default. [This is due to a bug in > Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in > 1.9. > There is a workaround to read the file back in using a file glob (filepath/*). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22919) Bump Apache httpclient versions
[ https://issues.apache.org/jira/browse/SPARK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22919: Assignee: Apache Spark > Bump Apache httpclient versions > --- > > Key: SPARK-22919 > URL: https://issues.apache.org/jira/browse/SPARK-22919 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Fokko Driesprong >Assignee: Apache Spark > > I would like to bump the PATCH versions of both the Apache httpclient Apache > httpcore. I use the SparkTC Stocator library for connecting to an object > store, and I would align the versions to reduce java version mismatches. > Furthermore it is good to bump these versions since they fix stability and > performance issues: > https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt > https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22919) Bump Apache httpclient versions
[ https://issues.apache.org/jira/browse/SPARK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305649#comment-16305649 ] Apache Spark commented on SPARK-22919: -- User 'Fokko' has created a pull request for this issue: https://github.com/apache/spark/pull/20103 > Bump Apache httpclient versions > --- > > Key: SPARK-22919 > URL: https://issues.apache.org/jira/browse/SPARK-22919 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Fokko Driesprong > > I would like to bump the PATCH versions of both the Apache httpclient Apache > httpcore. I use the SparkTC Stocator library for connecting to an object > store, and I would align the versions to reduce java version mismatches. > Furthermore it is good to bump these versions since they fix stability and > performance issues: > https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt > https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22919) Bump Apache httpclient versions
[ https://issues.apache.org/jira/browse/SPARK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22919: Assignee: (was: Apache Spark) > Bump Apache httpclient versions > --- > > Key: SPARK-22919 > URL: https://issues.apache.org/jira/browse/SPARK-22919 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Fokko Driesprong > > I would like to bump the PATCH versions of both the Apache httpclient Apache > httpcore. I use the SparkTC Stocator library for connecting to an object > store, and I would align the versions to reduce java version mismatches. > Furthermore it is good to bump these versions since they fix stability and > performance issues: > https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt > https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22919) Bump Apache httpclient versions
Fokko Driesprong created SPARK-22919: Summary: Bump Apache httpclient versions Key: SPARK-22919 URL: https://issues.apache.org/jira/browse/SPARK-22919 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1 Reporter: Fokko Driesprong I would like to bump the PATCH versions of both the Apache httpclient Apache httpcore. I use the SparkTC Stocator library for connecting to an object store, and I would align the versions to reduce java version mismatches. Furthermore it is good to bump these versions since they fix stability and performance issues: https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22577) executor page blacklist status should update with TaskSet level blacklisting
[ https://issues.apache.org/jira/browse/SPARK-22577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305643#comment-16305643 ] Attila Zsolt Piros commented on SPARK-22577: I started working on this issue > executor page blacklist status should update with TaskSet level blacklisting > > > Key: SPARK-22577 > URL: https://issues.apache.org/jira/browse/SPARK-22577 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.1.1 >Reporter: Thomas Graves > > right now the executor blacklist status only updates with the > BlacklistTracker after a task set has finished and propagated the > blacklisting to the application level. We should change that to show at the > taskset level as well. Without this it can be very confusing to the user why > things aren't running. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22875) Assembly build fails for a high user id
[ https://issues.apache.org/jira/browse/SPARK-22875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-22875: - Assignee: Gera Shegalov > Assembly build fails for a high user id > --- > > Key: SPARK-22875 > URL: https://issues.apache.org/jira/browse/SPARK-22875 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Minor > Fix For: 2.3.0 > > > {code} > ./build/mvn package -Pbigtop-dist -DskipTests > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single (dist) on project > spark-assembly_2.11: Execution dist of goal > org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single failed: user id > '123456789' is too big ( > 2097151 ). -> [Help 1] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22875) Assembly build fails for a high user id
[ https://issues.apache.org/jira/browse/SPARK-22875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22875. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20055 [https://github.com/apache/spark/pull/20055] > Assembly build fails for a high user id > --- > > Key: SPARK-22875 > URL: https://issues.apache.org/jira/browse/SPARK-22875 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.1 >Reporter: Gera Shegalov >Priority: Minor > Fix For: 2.3.0 > > > {code} > ./build/mvn package -Pbigtop-dist -DskipTests > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single (dist) on project > spark-assembly_2.11: Execution dist of goal > org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single failed: user id > '123456789' is too big ( > 2097151 ). -> [Help 1] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)
[ https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305612#comment-16305612 ] Gaurav Shah commented on SPARK-13127: - I am surprised people haven't hit https://issues.apache.org/jira/browse/PARQUET-353, I constantly face OOM error on a continuous streaming application. Wondering if we would get parquet 1.9.1 and then upgrade spark to use that. https://issues.apache.org/jira/browse/PARQUET-1027 > Upgrade Parquet to 1.9 (Fixes parquet sorting) > -- > > Key: SPARK-13127 > URL: https://issues.apache.org/jira/browse/SPARK-13127 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Justin Pihony > > Currently, when you write a sorted DataFrame to Parquet, then reading the > data back out is not sorted by default. [This is due to a bug in > Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in > 1.9. > There is a workaround to read the file back in using a file glob (filepath/*). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17123) Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile
[ https://issues.apache.org/jira/browse/SPARK-17123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305577#comment-16305577 ] Wassim edited comment on SPARK-17123 at 12/28/17 4:39 PM: -- Hello, having same issue at runtime in java with spark 2.2.0, could you please suggest a solution {{org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 214, Column 114: No applicable constructor/method found for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates are: "public static java.sql.Date org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"}} org.apache.spark spark-core_2.11 2.2.0 org.apache.spark spark-sql_2.11 2.2.0 was (Author: wassimdr): Hello, having same issue in java with spark 2.2.0, could you please suggest a solution {{org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 214, Column 114: No applicable constructor/method found for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates are: "public static java.sql.Date org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"}} org.apache.spark spark-core_2.11 2.2.0 org.apache.spark spark-sql_2.11 2.2.0 > Performing set operations that combine string and date / timestamp columns > may result in generated projection code which doesn't compile > > > Key: SPARK-17123 > URL: https://issues.apache.org/jira/browse/SPARK-17123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.0.2, 2.1.0 > > > The following example program causes SpecificSafeProjection code generation > to produce Java code which doesn't compile: > {code} > import org.apache.spark.sql.types._ > spark.sql("set spark.sql.codegen.fallback=false") > val dateDF = spark.createDataFrame(sc.parallelize(Seq(Row(new > java.sql.Date(0, StructType(StructField("value", DateType) :: Nil)) > val longDF = sc.parallelize(Seq(new java.sql.Date(0).toString)).toDF > dateDF.union(longDF).collect() > {code} > This fails at runtime with the following error: > {code} > failed to compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 28, Column 107: No applicable constructor/method found > for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates > are: "public static java.sql.Date > org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)" > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificSafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificSafeProjection extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private MutableRow mutableRow; > /* 009 */ private Object[] values; > /* 010 */ private org.apache.spark.sql.types.StructType schema; > /* 011 */ > /* 012 */ > /* 013 */ public SpecificSafeProjection(Object[] references) { > /* 014 */ this.references = references; > /* 015 */ mutableRow = (MutableRow) references[references.length - 1]; > /* 016 */ > /* 017 */ this.schema = (org.apache.spark.sql.types.StructType) > references[0]; > /* 018 */ } > /* 019 */ > /* 020 */ public java.lang.Object apply(java.lang.Object _i) { > /* 021 */ InternalRow i = (InternalRow) _i; > /* 022 */ > /* 023 */ values = new Object[1]; > /* 024 */ > /* 025 */ boolean isNull2 = i.isNullAt(0); > /* 026 */ UTF8String value2 = isNull2 ? null : (i.getUTF8String(0)); > /* 027 */ boolean isNull1 = isNull2; > /* 028 */ final java.sql.Date value1 = isNull1 ? null : > org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(value2); > /* 029 */ isNull1 = value1 == null; > /* 030 */ if (isNull1) { > /* 031 */ values[0] = null; > /* 032 */ } else { > /* 033 */ values[0] = value1; > /* 034 */ } > /* 035 */ > /* 036 */ final org.apache.spark.sql.Row value = new > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema(values, > schema); > /* 037 */ if (false) { > /* 038 */ mutableRow.setNullAt(0); > /* 039 */ } else { > /* 040 */ > /* 041 */ mutableRow.update(0, value); > /* 042 */ } > /* 043 */ > /* 044 */ return mutableRow; > /* 045 */ } > /* 046 */ } > {code} > Here, the invocation of {{DateTimeUtils.toJavaDate}} is incorrect because the > generated code tries to call it with a UTF8String while the method expects an > int instead. -- This message was sent by Atlassi
[jira] [Commented] (SPARK-17123) Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile
[ https://issues.apache.org/jira/browse/SPARK-17123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305577#comment-16305577 ] Wassim commented on SPARK-17123: Hello, having same issue in java with spark 2.2.0, could you please suggest a solution {{org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 214, Column 114: No applicable constructor/method found for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates are: "public static java.sql.Date org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"}} org.apache.spark spark-core_2.11 2.2.0 org.apache.spark spark-sql_2.11 2.2.0 > Performing set operations that combine string and date / timestamp columns > may result in generated projection code which doesn't compile > > > Key: SPARK-17123 > URL: https://issues.apache.org/jira/browse/SPARK-17123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.0.2, 2.1.0 > > > The following example program causes SpecificSafeProjection code generation > to produce Java code which doesn't compile: > {code} > import org.apache.spark.sql.types._ > spark.sql("set spark.sql.codegen.fallback=false") > val dateDF = spark.createDataFrame(sc.parallelize(Seq(Row(new > java.sql.Date(0, StructType(StructField("value", DateType) :: Nil)) > val longDF = sc.parallelize(Seq(new java.sql.Date(0).toString)).toDF > dateDF.union(longDF).collect() > {code} > This fails at runtime with the following error: > {code} > failed to compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 28, Column 107: No applicable constructor/method found > for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates > are: "public static java.sql.Date > org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)" > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificSafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificSafeProjection extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private MutableRow mutableRow; > /* 009 */ private Object[] values; > /* 010 */ private org.apache.spark.sql.types.StructType schema; > /* 011 */ > /* 012 */ > /* 013 */ public SpecificSafeProjection(Object[] references) { > /* 014 */ this.references = references; > /* 015 */ mutableRow = (MutableRow) references[references.length - 1]; > /* 016 */ > /* 017 */ this.schema = (org.apache.spark.sql.types.StructType) > references[0]; > /* 018 */ } > /* 019 */ > /* 020 */ public java.lang.Object apply(java.lang.Object _i) { > /* 021 */ InternalRow i = (InternalRow) _i; > /* 022 */ > /* 023 */ values = new Object[1]; > /* 024 */ > /* 025 */ boolean isNull2 = i.isNullAt(0); > /* 026 */ UTF8String value2 = isNull2 ? null : (i.getUTF8String(0)); > /* 027 */ boolean isNull1 = isNull2; > /* 028 */ final java.sql.Date value1 = isNull1 ? null : > org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(value2); > /* 029 */ isNull1 = value1 == null; > /* 030 */ if (isNull1) { > /* 031 */ values[0] = null; > /* 032 */ } else { > /* 033 */ values[0] = value1; > /* 034 */ } > /* 035 */ > /* 036 */ final org.apache.spark.sql.Row value = new > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema(values, > schema); > /* 037 */ if (false) { > /* 038 */ mutableRow.setNullAt(0); > /* 039 */ } else { > /* 040 */ > /* 041 */ mutableRow.update(0, value); > /* 042 */ } > /* 043 */ > /* 044 */ return mutableRow; > /* 045 */ } > /* 046 */ } > {code} > Here, the invocation of {{DateTimeUtils.toJavaDate}} is incorrect because the > generated code tries to call it with a UTF8String while the method expects an > int instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine",
[ https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305549#comment-16305549 ] Sean Owen commented on SPARK-22918: --- It sounds like you have a SecurityManager enabled. Can you turn that off? or are you sure you haven't made any special configurations like that? > sbt test (spark - local) fail after upgrading to 2.2.1 with: > java.security.AccessControlException: access denied > org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) > > > Key: SPARK-22918 > URL: https://issues.apache.org/jira/browse/SPARK-22918 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Damian Momot > > After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started > to fail with following exception: > {noformat} > java.security.AccessControlException: access denied > org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) > at > java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) > at > java.security.AccessController.checkPermission(AccessController.java:884) > at > org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown > Source) > at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown > Source) > at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) > at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) > at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source) > at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at java.lang.Class.newInstance(Class.java:442) > at > org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47) > at > org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) > at > org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325) > at > org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282) > at > org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) > at > org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) > at > org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) > at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) > at > org.datanucleus.api.jdo.JDOPersisten
[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u
[ https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damian Momot updated SPARK-22918: - Description: After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started to fail with following exception: {noformat} java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) at java.security.AccessController.checkPermission(AccessController.java:884) at org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown Source) at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source) at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47) at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325) at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282) at org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240) at org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHe
[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u
[ https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damian Momot updated SPARK-22918: - Description: After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following exception: {noformat} java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) at java.security.AccessController.checkPermission(AccessController.java:884) at org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown Source) at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source) at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47) at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325) at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282) at org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240) at org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u
[ https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damian Momot updated SPARK-22918: - Description: After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following exception: {noformat} java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) at java.security.AccessController.checkPermission(AccessController.java:884) at org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown Source) at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source) at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47) at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325) at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282) at org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240) at org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
[jira] [Created] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u
Damian Momot created SPARK-22918: Summary: sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) Key: SPARK-22918 URL: https://issues.apache.org/jira/browse/SPARK-22918 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1 Reporter: Damian Momot After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following exception: {noformat} java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) at java.security.AccessController.checkPermission(AccessController.java:884) at org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown Source) at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source) at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47) at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325) at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282) at org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240) at org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.jdo.JDOHelper$16.run(JDOHelper.jav
[jira] [Resolved] (SPARK-22917) Should not try to generate histogram for empty/null columns
[ https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22917. - Resolution: Fixed Issue resolved by pull request 20102 [https://github.com/apache/spark/pull/20102] > Should not try to generate histogram for empty/null columns > --- > > Key: SPARK-22917 > URL: https://issues.apache.org/jira/browse/SPARK-22917 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22917) Should not try to generate histogram for empty/null columns
[ https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22917: --- Assignee: Zhenhua Wang > Should not try to generate histogram for empty/null columns > --- > > Key: SPARK-22917 > URL: https://issues.apache.org/jira/browse/SPARK-22917 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21208) Ability to "setLocalProperty" from sc, in sparkR
[ https://issues.apache.org/jira/browse/SPARK-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-21208: Assignee: Hyukjin Kwon > Ability to "setLocalProperty" from sc, in sparkR > > > Key: SPARK-21208 > URL: https://issues.apache.org/jira/browse/SPARK-21208 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 2.1.1 >Reporter: Karuppayya >Assignee: Hyukjin Kwon > Fix For: 2.3.0 > > > Checked the API > [documentation|https://spark.apache.org/docs/latest/api/R/index.html] for > sparkR. > Was not able to find a way to *setLocalProperty* on sc. > Need ability to *setLocalProperty* on sparkContext(similar to available for > pyspark, scala) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21208) Ability to "setLocalProperty" from sc, in sparkR
[ https://issues.apache.org/jira/browse/SPARK-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-21208. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20075 [https://github.com/apache/spark/pull/20075] > Ability to "setLocalProperty" from sc, in sparkR > > > Key: SPARK-21208 > URL: https://issues.apache.org/jira/browse/SPARK-21208 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 2.1.1 >Reporter: Karuppayya >Assignee: Hyukjin Kwon > Fix For: 2.3.0 > > > Checked the API > [documentation|https://spark.apache.org/docs/latest/api/R/index.html] for > sparkR. > Was not able to find a way to *setLocalProperty* on sc. > Need ability to *setLocalProperty* on sparkContext(similar to available for > pyspark, scala) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22843) R localCheckpoint API
[ https://issues.apache.org/jira/browse/SPARK-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-22843. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20073 [https://github.com/apache/spark/pull/20073] > R localCheckpoint API > - > > Key: SPARK-22843 > URL: https://issues.apache.org/jira/browse/SPARK-22843 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22843) R localCheckpoint API
[ https://issues.apache.org/jira/browse/SPARK-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-22843: Assignee: Hyukjin Kwon > R localCheckpoint API > - > > Key: SPARK-22843 > URL: https://issues.apache.org/jira/browse/SPARK-22843 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Hyukjin Kwon > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21828) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB...again
[ https://issues.apache.org/jira/browse/SPARK-21828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-21828. - Resolution: Duplicate > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB...again > - > > Key: SPARK-21828 > URL: https://issues.apache.org/jira/browse/SPARK-21828 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Otis Smart >Priority: Critical > > Hello! > 1. I encounter a similar issue (see below text) on Pyspark 2.2 (e.g., > dataframe with ~5 rows x 1100+ columns as input to ".fit()" method of > CrossValidator() that includes Pipeline() that includes StringIndexer(), > VectorAssembler() and DecisionTreeClassifier()). > 2. Was the aforementioned patch (aka > fix(https://github.com/apache/spark/pull/15480) not included in the latest > release; what are the reason and (source) of and solution to this persistent > issue please? > py4j.protocol.Py4JJavaError: An error occurred while calling o9396.fit. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 38 > in stage 18.0 failed 4 times, most recent failure: Lost task 38.3 in stage > 18.0 (TID 1996, ip-10-0-14-83.ec2.internal, executor 4): > java.util.concurrent.ExecutionException: java.lang.Exception: failed to > compile: org.codehaus.janino.JaninoRuntimeException: Code of method > "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > /* 001 */ public SpecificOrdering generate(Object[] references) > { /* 002 */ return new SpecificOrdering(references); /* 003 */ } > /* 004 */ > /* 005 */ class SpecificOrdering extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ > /* 009 */ > /* 010 */ public SpecificOrdering(Object[] references) > { /* 011 */ this.references = references; /* 012 */ /* 013 */ } > /* 014 */ > /* 015 */ > /* 016 */ > /* 017 */ public int compare(InternalRow a, InternalRow b) { > /* 018 */ InternalRow i = null; // Holds current row being evaluated. > /* 019 */ > /* 020 */ i = a; > /* 021 */ boolean isNullA; > /* 022 */ double primitiveA; > /* 023 */ > { /* 024 */ /* 025 */ double value = i.getDouble(0); /* 026 */ isNullA = > false; /* 027 */ primitiveA = value; /* 028 */ } > /* 029 */ i = b; > /* 030 */ boolean isNullB; > /* 031 */ double primitiveB; > /* 032 */ > { /* 033 */ /* 034 */ double value = i.getDouble(0); /* 035 */ isNullB = > false; /* 036 */ primitiveB = value; /* 037 */ } > /* 038 */ if (isNullA && isNullB) > { /* 039 */ // Nothing /* 040 */ } > else if (isNullA) > { /* 041 */ return -1; /* 042 */ } > else if (isNullB) > { /* 043 */ return 1; /* 044 */ } > else { > /* 045 */ int comp = > org.apache.spark.util.Utils.nanSafeCompareDoubles(primitiveA, primitiveB); > /* 046 */ if (comp != 0) > { /* 047 */ return comp; /* 048 */ } > /* 049 */ } > /* 050 */ > /* 051 */ > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21616) SparkR 2.3.0 migration guide, release note
[ https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305192#comment-16305192 ] Felix Cheung commented on SPARK-21616: -- SPARK-22315 > SparkR 2.3.0 migration guide, release note > -- > > Key: SPARK-21616 > URL: https://issues.apache.org/jira/browse/SPARK-21616 > Project: Spark > Issue Type: Documentation > Components: Documentation, SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: Felix Cheung > > From looking at changes since 2.2.0, this/these should be documented in the > migration guide / release note for the 2.3.0 release, as it is behavior > changes -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22917) Should not try to generate histogram for empty/null columns
[ https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22917: Assignee: Apache Spark > Should not try to generate histogram for empty/null columns > --- > > Key: SPARK-22917 > URL: https://issues.apache.org/jira/browse/SPARK-22917 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang >Assignee: Apache Spark > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22917) Should not try to generate histogram for empty/null columns
[ https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22917: Assignee: (was: Apache Spark) > Should not try to generate histogram for empty/null columns > --- > > Key: SPARK-22917 > URL: https://issues.apache.org/jira/browse/SPARK-22917 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22917) Should not try to generate histogram for empty/null columns
[ https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305177#comment-16305177 ] Apache Spark commented on SPARK-22917: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/20102 > Should not try to generate histogram for empty/null columns > --- > > Key: SPARK-22917 > URL: https://issues.apache.org/jira/browse/SPARK-22917 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22917) Should not try to generate histogram for empty/null columns
Zhenhua Wang created SPARK-22917: Summary: Should not try to generate histogram for empty/null columns Key: SPARK-22917 URL: https://issues.apache.org/jira/browse/SPARK-22917 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Zhenhua Wang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error
[ https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305157#comment-16305157 ] Felix Cheung commented on SPARK-21727: -- [~neilalex] How it is going? > Operating on an ArrayType in a SparkR DataFrame throws error > > > Key: SPARK-21727 > URL: https://issues.apache.org/jira/browse/SPARK-21727 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Neil Alexander McQuarrie >Assignee: Neil Alexander McQuarrie > > Previously > [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements] > this as a stack overflow question but it seems to be a bug. > If I have an R data.frame where one of the column data types is an integer > *list* -- i.e., each of the elements in the column embeds an entire R list of > integers -- then it seems I can convert this data.frame to a SparkR DataFrame > just fine... SparkR treats the column as ArrayType(Double). > However, any subsequent operation on this SparkR DataFrame appears to throw > an error. > Create an example R data.frame: > {code} > indices <- 1:4 > myDf <- data.frame(indices) > myDf$data <- list(rep(0, 20))}} > {code} > Examine it to make sure it looks okay: > {code} > > str(myDf) > 'data.frame': 4 obs. of 2 variables: > $ indices: int 1 2 3 4 > $ data :List of 4 >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... >..$ : num 0 0 0 0 0 0 0 0 0 0 ... > > head(myDf) > indices data > 1 1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 2 2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 3 3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > 4 4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 > {code} > Convert it to a SparkR DataFrame: > {code} > library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib")) > sparkR.session(master = "local[*]") > mySparkDf <- as.DataFrame(myDf) > {code} > Examine the SparkR DataFrame schema; notice that the list column was > successfully converted to ArrayType: > {code} > > schema(mySparkDf) > StructType > |-name = "indices", type = "IntegerType", nullable = TRUE > |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE > {code} > However, operating on the SparkR DataFrame throws an error: > {code} > > collect(mySparkDf) > 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Double is not a valid external type for schema of array > if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null > else validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0 > ... long stack trace ... > {code} > Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org