[jira] [Commented] (SPARK-24703) Unable to multiply calender interval with long/int
[ https://issues.apache.org/jira/browse/SPARK-24703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529456#comment-16529456 ] Takeshi Yamamuro commented on SPARK-24703: -- yea, I've noticed that the SQL standard supports the syntax: http://download.mimer.com/pub/developer/docs/html_100/Mimer_SQL_Engine_DocSet/Syntax_Rules4.html#wp1113535 > Unable to multiply calender interval with long/int > -- > > Key: SPARK-24703 > URL: https://issues.apache.org/jira/browse/SPARK-24703 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Priyanka Garg >Priority: Major > > When i am trying to multiply calender interval with long/int , I am getting > below error. The same syntax is supported in Postgres. > spark.sql("select 3 * interval '1' day").show() > org.apache.spark.sql.AnalysisException: cannot resolve '(3 * interval 1 > days)' due to data type mismatch: differing types in '(3 * interval 1 days)' > (int and calendarinterval).; line 1 pos 7; > 'Project [unresolvedalias((3 * interval 1 days), None)] > +- OneRowRelation > > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24665) Add SQLConf in PySpark to manage all sql configs
[ https://issues.apache.org/jira/browse/SPARK-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-24665: Assignee: Li Yuanjian > Add SQLConf in PySpark to manage all sql configs > > > Key: SPARK-24665 > URL: https://issues.apache.org/jira/browse/SPARK-24665 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Li Yuanjian >Assignee: Li Yuanjian >Priority: Major > Fix For: 2.4.0 > > > With new config adding in PySpark, we currently get them by hard coding the > config name and default value. We should move all the configs into a Class > like what we did in Spark SQL Conf. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24665) Add SQLConf in PySpark to manage all sql configs
[ https://issues.apache.org/jira/browse/SPARK-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24665. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21648 [https://github.com/apache/spark/pull/21648] > Add SQLConf in PySpark to manage all sql configs > > > Key: SPARK-24665 > URL: https://issues.apache.org/jira/browse/SPARK-24665 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Li Yuanjian >Assignee: Li Yuanjian >Priority: Major > Fix For: 2.4.0 > > > With new config adding in PySpark, we currently get them by hard coding the > config name and default value. We should move all the configs into a Class > like what we did in Spark SQL Conf. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24705) Spark.sql.adaptive.enabled=true is enabled and self-join query
[ https://issues.apache.org/jira/browse/SPARK-24705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529421#comment-16529421 ] Takeshi Yamamuro edited comment on SPARK-24705 at 7/2/18 6:20 AM: -- I checked the master has the same issue. Also, it seems this issue only happens when using jdbc sources. {code:java} // Prepare test data in postgresql postgres=# create table device_loc(imei int, speed int); CREATE TABLE postgres=# insert into device_loc values (1, 1); INSERT 0 1 postgres=# select * from device_loc; imei | speed --+--- 1 | 1 (1 row) // Register as a jdbc table scala> val jdbcTable = spark.read.jdbc("jdbc:postgresql:postgres", "device_loc", options) scala> jdbcTable.registerTempTable("device_loc") scala> sql("SELECT * FROM device_loc").show ++-+ |imei|speed| ++-+ | 1|1| ++-+ // Prepare a query scala> :paste val df = sql(""" select tv_a.imei from ( select a.imei,a.speed from device_loc a) tv_a inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = tv_b.imei group by tv_a.imei """) // Run tests scala> sql("SET spark.sql.adaptive.enabled=false") scala> df.show ++ |imei| ++ | 1| ++ scala> sql("SET spark.sql.adaptive.enabled=true") scala> df.show org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange(coordinator id: 1401717308) hashpartitioning(imei#0, 200), coordinator[target post-shuffle partition size: 67108864] +- *(1) Scan JDBCRelation(device_loc) [numPartitions=1] [imei#0] PushedFilters: [*IsNotNull(imei)], ReadSchema: struct Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:201) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.postShuffleRDD(ExchangeCoordinator.scala:259) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:124) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 100 more {code} It seems this issue doesn't happen in the other datasources {code:java} scala> sql("SET spark.sql.adaptive.enabled=true") scala> spark.range(1).selectExpr("id AS imei", "id AS speed").write.saveAsTable("device_loc") scala> :paste val df = sql(""" select tv_a.imei from ( select a.imei,a.speed from device_loc a) tv_a inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = tv_b.imei group by tv_a.imei """) scala> df.show() ++ |imei| ++ | 0| ++ {code} was (Author: maropu): I checked the master has the same issue. Also, it seems this issue only happens when using jdbc sources. {code} // Prepare test data in postgresql postgres=# create table device_loc(imei int, speed int); CREATE TABLE postgres=# insert into device_loc values (1, 1); INSERT 0 1 postgres=# select * from device_loc; imei | speed --+--- 1 | 1 (1 row) // Register as a jdbc table scala> val jdbcTable = spark.read.jdbc("jdbc:postgresql:postgres", "device_loc", options) scala> jdbcTable.registerTempTable("device_loc") scala> sql("SELECT * FROM device_loc").show ++-+ |imei|speed| ++-+ | 1|1| ++-+ // Prepare a query scala> :paste val df = sql(""" select tv_a.imei from ( select a.imei,a.speed from device_loc a) tv_a inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = tv_b.imei group by tv_a.imei """) // Run tests scala> sql("SET spark.sql.adaptive.enabled=false") scala> df.show ++ |imei| ++ | 1| ++ scala> sql("SET spark.sql.adaptive.enabled=true") scala> df.show org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange(coordinator id: 1401717308) hashpartitioning(imei#0, 200), coordinator[target post-shuffle partition size: 67108864] +- *(1) Scan JDBCRelation(device_loc) [numPartitions=1] [imei#0] PushedFilters: [*IsNotNull(imei)], ReadSchema: struct Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:201) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.postShuffleRDD(ExchangeCoordinator.scala:259) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:124) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 100 more {code} This issue doesn't ha
[jira] [Commented] (SPARK-24705) Spark.sql.adaptive.enabled=true is enabled and self-join query
[ https://issues.apache.org/jira/browse/SPARK-24705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529421#comment-16529421 ] Takeshi Yamamuro commented on SPARK-24705: -- I checked the master has the same issue. Also, it seems this issue only happens when using jdbc sources. {code} // Prepare test data in postgresql postgres=# create table device_loc(imei int, speed int); CREATE TABLE postgres=# insert into device_loc values (1, 1); INSERT 0 1 postgres=# select * from device_loc; imei | speed --+--- 1 | 1 (1 row) // Register as a jdbc table scala> val jdbcTable = spark.read.jdbc("jdbc:postgresql:postgres", "device_loc", options) scala> jdbcTable.registerTempTable("device_loc") scala> sql("SELECT * FROM device_loc").show ++-+ |imei|speed| ++-+ | 1|1| ++-+ // Prepare a query scala> :paste val df = sql(""" select tv_a.imei from ( select a.imei,a.speed from device_loc a) tv_a inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = tv_b.imei group by tv_a.imei """) // Run tests scala> sql("SET spark.sql.adaptive.enabled=false") scala> df.show ++ |imei| ++ | 1| ++ scala> sql("SET spark.sql.adaptive.enabled=true") scala> df.show org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange(coordinator id: 1401717308) hashpartitioning(imei#0, 200), coordinator[target post-shuffle partition size: 67108864] +- *(1) Scan JDBCRelation(device_loc) [numPartitions=1] [imei#0] PushedFilters: [*IsNotNull(imei)], ReadSchema: struct Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:201) at org.apache.spark.sql.execution.exchange.ExchangeCoordinator.postShuffleRDD(ExchangeCoordinator.scala:259) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:124) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 100 more {code} This issue doesn't happen in datasource {code} scala> sql("SET spark.sql.adaptive.enabled=true") scala> spark.range(1).selectExpr("id AS imei", "id AS speed").write.saveAsTable("device_loc") scala> :paste val df = sql(""" select tv_a.imei from ( select a.imei,a.speed from device_loc a) tv_a inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = tv_b.imei group by tv_a.imei """) scala> df.show() ++ |imei| ++ | 0| ++ {code} > Spark.sql.adaptive.enabled=true is enabled and self-join query > -- > > Key: SPARK-24705 > URL: https://issues.apache.org/jira/browse/SPARK-24705 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.1 >Reporter: cheng >Priority: Minor > Attachments: Error stack.txt > > > [~smilegator] > When loading data using jdbc and enabling spark.sql.adaptive.enabled=true , > for example loading a tableA table, unexpected results can occur when you use > the following query. > For example: > device_loc table comes from the jdbc data source > select tv_a.imei > from ( select a.imei,a.speed from device_loc a) tv_a > inner join ( select a.imei,a.speed from device_loc a ) tv_b on tv_a.imei = > tv_b.imei > group by tv_a.imei > When the cache tabel device_loc is executed before this query is executed, > everything is fine,However, if you do not execute cache table, unexpected > results will occur, resulting in failure to execute. > Remarks:Attachment records the stack when the error occurred -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24706) Support ByteType and ShortType pushdown to parquet
[ https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-24706: --- Assignee: Yuming Wang > Support ByteType and ShortType pushdown to parquet > -- > > Key: SPARK-24706 > URL: https://issues.apache.org/jira/browse/SPARK-24706 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24706) Support ByteType and ShortType pushdown to parquet
[ https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24706: Target Version/s: 2.4.0 > Support ByteType and ShortType pushdown to parquet > -- > > Key: SPARK-24706 > URL: https://issues.apache.org/jira/browse/SPARK-24706 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24714) AnalysisSuite should use ClassTag to check the runtime instance
[ https://issues.apache.org/jira/browse/SPARK-24714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529405#comment-16529405 ] Chia-Ping Tsai commented on SPARK-24714: [~maropu] thank you again. :) > AnalysisSuite should use ClassTag to check the runtime instance > --- > > Key: SPARK-24714 > URL: https://issues.apache.org/jira/browse/SPARK-24714 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.3.1 >Reporter: Chia-Ping Tsai >Priority: Minor > > {code:java} > test("SPARK-22614 RepartitionByExpression partitioning") { > def checkPartitioning[T <: Partitioning](numPartitions: Int, exprs: > Expression*): Unit = { > val partitioning = RepartitionByExpression(exprs, testRelation2, > numPartitions).partitioning > assert(partitioning.isInstanceOf[T]) // it always be true because of type > erasure > }{code} > Spark support the scala 2.10 and 2.11 so it is ok to introduce ClassTag to > correct the type check. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24708) Document the default spark url of master in standalone is "spark://localhost:7070"
[ https://issues.apache.org/jira/browse/SPARK-24708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529404#comment-16529404 ] Chia-Ping Tsai commented on SPARK-24708: [~maropu] Thanks for the kind reminder. > Document the default spark url of master in standalone is > "spark://localhost:7070" > -- > > Key: SPARK-24708 > URL: https://issues.apache.org/jira/browse/SPARK-24708 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.1 >Reporter: Chia-Ping Tsai >Priority: Trivial > > In the section "Starting a Cluster Manually" we give a example of starting a > worker. > {code:java} > ./sbin/start-slave.sh {code} > However, we only mention the default "web port" so readers may be misled into > using the "web port" to start the worker. (of course, I am a "reader" too :() > It seems to me that adding a bit description of default spark url of master > can avoid above ambiguity. > for example: > {code:java} > - Similarly, you can start one or more workers and connect them to the master > via: > + Similarly, you can start one or more workers and connect them to the > master's spark URL (default: spark://:7070) via:{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24708) Document the default spark url of master in standalone is "spark://localhost:7070"
[ https://issues.apache.org/jira/browse/SPARK-24708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529394#comment-16529394 ] Takeshi Yamamuro commented on SPARK-24708: -- Feel free to make a pr and you can discuss there. Btw, you don't need to file a jira cuz this is a trivial fix. > Document the default spark url of master in standalone is > "spark://localhost:7070" > -- > > Key: SPARK-24708 > URL: https://issues.apache.org/jira/browse/SPARK-24708 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.1 >Reporter: Chia-Ping Tsai >Priority: Trivial > > In the section "Starting a Cluster Manually" we give a example of starting a > worker. > {code:java} > ./sbin/start-slave.sh {code} > However, we only mention the default "web port" so readers may be misled into > using the "web port" to start the worker. (of course, I am a "reader" too :() > It seems to me that adding a bit description of default spark url of master > can avoid above ambiguity. > for example: > {code:java} > - Similarly, you can start one or more workers and connect them to the master > via: > + Similarly, you can start one or more workers and connect them to the > master's spark URL (default: spark://:7070) via:{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24715) sbt build brings a wrong jline versions
[ https://issues.apache.org/jira/browse/SPARK-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24715: - Priority: Critical (was: Major) > sbt build brings a wrong jline versions > --- > > Key: SPARK-24715 > URL: https://issues.apache.org/jira/browse/SPARK-24715 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > During SPARK-24418 (Upgrade Scala to 2.11.12 and 2.12.6), we upgrade `jline` > version together. So, `mvn` works correctly. However, `sbt` brings old jline > library and is hitting `NoSuchMethodError` in `master` branch. Since we use > `mvn` mainly, this is dev environment issue. > {code} > $ ./build/sbt -Pyarn -Phadoop-2.7 -Phadoop-cloud -Phive -Phive-thriftserver > -Psparkr test:package > $ bin/spark-shell > scala> Spark context Web UI available at http://localhost:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1530385877441). > Spark session available as 'spark'. > Exception in thread "main" java.lang.NoSuchMethodError: > jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24715) sbt build brings a wrong jline versions
[ https://issues.apache.org/jira/browse/SPARK-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24715: -- Description: During SPARK-24418 (Upgrade Scala to 2.11.12 and 2.12.6), we upgrade `jline` version together. So, `mvn` works correctly. However, `sbt` brings old jline library and is hitting `NoSuchMethodError` in `master` branch. Since we use `mvn` mainly, this is dev environment issue. {code} $ ./build/sbt -Pyarn -Phadoop-2.7 -Phadoop-cloud -Phive -Phive-thriftserver -Psparkr test:package $ bin/spark-shell scala> Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1530385877441). Spark session available as 'spark'. Exception in thread "main" java.lang.NoSuchMethodError: jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V {code} was: During SPARK-24418 (Upgrade Scala to 2.11.12 and 2.12.6), we upgrade `jline` version together. So, `mvn` works correctly. However, `sbt` brings old jline library and is hitting `NoSuchMethodError` in `master` branch. {code} $ ./build/sbt -Pyarn -Phadoop-2.7 -Phadoop-cloud -Phive -Phive-thriftserver -Psparkr test:package $ bin/spark-shell scala> Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1530385877441). Spark session available as 'spark'. Exception in thread "main" java.lang.NoSuchMethodError: jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V {code} > sbt build brings a wrong jline versions > --- > > Key: SPARK-24715 > URL: https://issues.apache.org/jira/browse/SPARK-24715 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Major > > During SPARK-24418 (Upgrade Scala to 2.11.12 and 2.12.6), we upgrade `jline` > version together. So, `mvn` works correctly. However, `sbt` brings old jline > library and is hitting `NoSuchMethodError` in `master` branch. Since we use > `mvn` mainly, this is dev environment issue. > {code} > $ ./build/sbt -Pyarn -Phadoop-2.7 -Phadoop-cloud -Phive -Phive-thriftserver > -Psparkr test:package > $ bin/spark-shell > scala> Spark context Web UI available at http://localhost:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1530385877441). > Spark session available as 'spark'. > Exception in thread "main" java.lang.NoSuchMethodError: > jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24715) sbt build brings a wrong jline versions
[ https://issues.apache.org/jira/browse/SPARK-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24715: -- Summary: sbt build brings a wrong jline versions (was: sbt build bring a wrong jline versions) > sbt build brings a wrong jline versions > --- > > Key: SPARK-24715 > URL: https://issues.apache.org/jira/browse/SPARK-24715 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Major > > During SPARK-24418 (Upgrade Scala to 2.11.12 and 2.12.6), we upgrade `jline` > version together. So, `mvn` works correctly. However, `sbt` brings old jline > library and is hitting `NoSuchMethodError` in `master` branch. > {code} > $ ./build/sbt -Pyarn -Phadoop-2.7 -Phadoop-cloud -Phive -Phive-thriftserver > -Psparkr test:package > $ bin/spark-shell > scala> Spark context Web UI available at http://localhost:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1530385877441). > Spark session available as 'spark'. > Exception in thread "main" java.lang.NoSuchMethodError: > jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24715) sbt build bring a wrong jline versions
Dongjoon Hyun created SPARK-24715: - Summary: sbt build bring a wrong jline versions Key: SPARK-24715 URL: https://issues.apache.org/jira/browse/SPARK-24715 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.0 Reporter: Dongjoon Hyun During SPARK-24418 (Upgrade Scala to 2.11.12 and 2.12.6), we upgrade `jline` version together. So, `mvn` works correctly. However, `sbt` brings old jline library and is hitting `NoSuchMethodError` in `master` branch. {code} $ ./build/sbt -Pyarn -Phadoop-2.7 -Phadoop-cloud -Phive -Phive-thriftserver -Psparkr test:package $ bin/spark-shell scala> Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1530385877441). Spark session available as 'spark'. Exception in thread "main" java.lang.NoSuchMethodError: jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24714) AnalysisSuite should use ClassTag to check the runtime instance
[ https://issues.apache.org/jira/browse/SPARK-24714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529373#comment-16529373 ] Takeshi Yamamuro commented on SPARK-24714: -- You needn't do that; feel free to make a pr for this ticket. btw, since this is trivial, I think you don't file a jira > AnalysisSuite should use ClassTag to check the runtime instance > --- > > Key: SPARK-24714 > URL: https://issues.apache.org/jira/browse/SPARK-24714 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.3.1 >Reporter: Chia-Ping Tsai >Priority: Minor > > {code:java} > test("SPARK-22614 RepartitionByExpression partitioning") { > def checkPartitioning[T <: Partitioning](numPartitions: Int, exprs: > Expression*): Unit = { > val partitioning = RepartitionByExpression(exprs, testRelation2, > numPartitions).partitioning > assert(partitioning.isInstanceOf[T]) // it always be true because of type > erasure > }{code} > Spark support the scala 2.10 and 2.11 so it is ok to introduce ClassTag to > correct the type check. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken
[ https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529371#comment-16529371 ] Hyukjin Kwon commented on SPARK-24530: -- Yea, will post a email to related threads, and try to deal with it very soon. [~mengxr], mind if I set the priority to Critical since we have a workaround to get through this anyway? > Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) > and pyspark.ml docs are broken > --- > > Key: SPARK-24530 > URL: https://issues.apache.org/jira/browse/SPARK-24530 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.4.0 >Reporter: Xiangrui Meng >Assignee: Hyukjin Kwon >Priority: Blocker > Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot > 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, > pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png > > > I generated python docs from master locally using `make html`. However, the > generated html doc doesn't render class docs correctly. I attached the > screenshot from Spark 2.3 docs and master docs generated on my local. Not > sure if this is because my local setup. > cc: [~dongjoon] Could you help verify? > > The followings are our released doc status. Some recent docs seems to be > broken. > *2.1.x* > (O) > [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (O) > [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > *2.2.x* > (O) > [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > *2.3.x* > (O) > [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24713) AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed
[ https://issues.apache.org/jira/browse/SPARK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24713: Assignee: Apache Spark > AppMatser of spark streaming kafka OOM if there are hundreds of topics > consumed > --- > > Key: SPARK-24713 > URL: https://issues.apache.org/jira/browse/SPARK-24713 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.3.1 >Reporter: Yuanbo Liu >Assignee: Apache Spark >Priority: Major > > We have hundreds of kafka topics need to be consumed in one application. The > application master will throw OOM exception after hanging for nearly half of > an hour. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24713) AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed
[ https://issues.apache.org/jira/browse/SPARK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24713: Assignee: (was: Apache Spark) > AppMatser of spark streaming kafka OOM if there are hundreds of topics > consumed > --- > > Key: SPARK-24713 > URL: https://issues.apache.org/jira/browse/SPARK-24713 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.3.1 >Reporter: Yuanbo Liu >Priority: Major > > We have hundreds of kafka topics need to be consumed in one application. The > application master will throw OOM exception after hanging for nearly half of > an hour. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24713) AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed
[ https://issues.apache.org/jira/browse/SPARK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529367#comment-16529367 ] Apache Spark commented on SPARK-24713: -- User 'yuanboliu' has created a pull request for this issue: https://github.com/apache/spark/pull/21690 > AppMatser of spark streaming kafka OOM if there are hundreds of topics > consumed > --- > > Key: SPARK-24713 > URL: https://issues.apache.org/jira/browse/SPARK-24713 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.3.1 >Reporter: Yuanbo Liu >Priority: Major > > We have hundreds of kafka topics need to be consumed in one application. The > application master will throw OOM exception after hanging for nearly half of > an hour. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24528) Missing optimization for Aggregations/Windowing on a bucketed table
[ https://issues.apache.org/jira/browse/SPARK-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529361#comment-16529361 ] Liang-Chi Hsieh commented on SPARK-24528: - I think we can have a sql config to control enabling/disabling this behavior too. > Missing optimization for Aggregations/Windowing on a bucketed table > --- > > Key: SPARK-24528 > URL: https://issues.apache.org/jira/browse/SPARK-24528 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Ohad Raviv >Priority: Major > > Closely related to SPARK-24410, we're trying to optimize a very common use > case we have of getting the most updated row by id from a fact table. > We're saving the table bucketed to skip the shuffle stage, but we're still > "waste" time on the Sort operator evethough the data is already sorted. > here's a good example: > {code:java} > sparkSession.range(N).selectExpr( > "id as key", > "id % 2 as t1", > "id % 3 as t2") > .repartition(col("key")) > .write > .mode(SaveMode.Overwrite) > .bucketBy(3, "key") > .sortBy("key", "t1") > .saveAsTable("a1"){code} > {code:java} > sparkSession.sql("select max(struct(t1, *)) from a1 group by key").explain > == Physical Plan == > SortAggregate(key=[key#24L], functions=[max(named_struct(t1, t1#25L, key, > key#24L, t1, t1#25L, t2, t2#26L))]) > +- SortAggregate(key=[key#24L], functions=[partial_max(named_struct(t1, > t1#25L, key, key#24L, t1, t1#25L, t2, t2#26L))]) > +- *(1) FileScan parquet default.a1[key#24L,t1#25L,t2#26L] Batched: true, > Format: Parquet, Location: ...{code} > > and here's a bad example, but more realistic: > {code:java} > sparkSession.sql("set spark.sql.shuffle.partitions=2") > sparkSession.sql("select max(struct(t1, *)) from a1 group by key").explain > == Physical Plan == > SortAggregate(key=[key#32L], functions=[max(named_struct(t1, t1#33L, key, > key#32L, t1, t1#33L, t2, t2#34L))]) > +- SortAggregate(key=[key#32L], functions=[partial_max(named_struct(t1, > t1#33L, key, key#32L, t1, t1#33L, t2, t2#34L))]) > +- *(1) Sort [key#32L ASC NULLS FIRST], false, 0 > +- *(1) FileScan parquet default.a1[key#32L,t1#33L,t2#34L] Batched: true, > Format: Parquet, Location: ... > {code} > > I've traced the problem to DataSourceScanExec#235: > {code:java} > val sortOrder = if (sortColumns.nonEmpty) { > // In case of bucketing, its possible to have multiple files belonging to > the > // same bucket in a given relation. Each of these files are locally sorted > // but those files combined together are not globally sorted. Given that, > // the RDD partition will not be sorted even if the relation has sort > columns set > // Current solution is to check if all the buckets have a single file in it > val files = selectedPartitions.flatMap(partition => partition.files) > val bucketToFilesGrouping = > files.map(_.getPath.getName).groupBy(file => > BucketingUtils.getBucketId(file)) > val singleFilePartitions = bucketToFilesGrouping.forall(p => p._2.length <= > 1){code} > so obviously the code avoids dealing with this situation now.. > could you think of a way to solve this or bypass it? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24714) AnalysisSuite should use ClassTag to check the runtime instance
[ https://issues.apache.org/jira/browse/SPARK-24714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529356#comment-16529356 ] Chia-Ping Tsai commented on SPARK-24714: I have no permission to assign this Jira to myself. need helps. > AnalysisSuite should use ClassTag to check the runtime instance > --- > > Key: SPARK-24714 > URL: https://issues.apache.org/jira/browse/SPARK-24714 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.3.1 >Reporter: Chia-Ping Tsai >Priority: Minor > > {code:java} > test("SPARK-22614 RepartitionByExpression partitioning") { > def checkPartitioning[T <: Partitioning](numPartitions: Int, exprs: > Expression*): Unit = { > val partitioning = RepartitionByExpression(exprs, testRelation2, > numPartitions).partitioning > assert(partitioning.isInstanceOf[T]) // it always be true because of type > erasure > }{code} > Spark support the scala 2.10 and 2.11 so it is ok to introduce ClassTag to > correct the type check. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24714) AnalysisSuite should use ClassTag to check the runtime instance
Chia-Ping Tsai created SPARK-24714: -- Summary: AnalysisSuite should use ClassTag to check the runtime instance Key: SPARK-24714 URL: https://issues.apache.org/jira/browse/SPARK-24714 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.3.1 Reporter: Chia-Ping Tsai {code:java} test("SPARK-22614 RepartitionByExpression partitioning") { def checkPartitioning[T <: Partitioning](numPartitions: Int, exprs: Expression*): Unit = { val partitioning = RepartitionByExpression(exprs, testRelation2, numPartitions).partitioning assert(partitioning.isInstanceOf[T]) // it always be true because of type erasure }{code} Spark support the scala 2.10 and 2.11 so it is ok to introduce ClassTag to correct the type check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken
[ https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529352#comment-16529352 ] Saisai Shao commented on SPARK-24530: - [~hyukjin.kwon], Spark 2.1.3 and 2.2.2 are on vote, can you please fix the issue and leave the comments in the related threads. > Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) > and pyspark.ml docs are broken > --- > > Key: SPARK-24530 > URL: https://issues.apache.org/jira/browse/SPARK-24530 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.4.0 >Reporter: Xiangrui Meng >Assignee: Hyukjin Kwon >Priority: Blocker > Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot > 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, > pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png > > > I generated python docs from master locally using `make html`. However, the > generated html doc doesn't render class docs correctly. I attached the > screenshot from Spark 2.3 docs and master docs generated on my local. Not > sure if this is because my local setup. > cc: [~dongjoon] Could you help verify? > > The followings are our released doc status. Some recent docs seems to be > broken. > *2.1.x* > (O) > [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (O) > [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > *2.2.x* > (O) > [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > *2.3.x* > (O) > [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken
[ https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-24530: Target Version/s: 2.1.3, 2.2.2, 2.3.2, 2.4.0 (was: 2.4.0) > Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) > and pyspark.ml docs are broken > --- > > Key: SPARK-24530 > URL: https://issues.apache.org/jira/browse/SPARK-24530 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.4.0 >Reporter: Xiangrui Meng >Assignee: Hyukjin Kwon >Priority: Blocker > Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot > 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, > pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png > > > I generated python docs from master locally using `make html`. However, the > generated html doc doesn't render class docs correctly. I attached the > screenshot from Spark 2.3 docs and master docs generated on my local. Not > sure if this is because my local setup. > cc: [~dongjoon] Could you help verify? > > The followings are our released doc status. Some recent docs seems to be > broken. > *2.1.x* > (O) > [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (O) > [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > *2.2.x* > (O) > [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > *2.3.x* > (O) > [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] > (X) > [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24713) AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed
Yuanbo Liu created SPARK-24713: -- Summary: AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed Key: SPARK-24713 URL: https://issues.apache.org/jira/browse/SPARK-24713 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.3.1 Reporter: Yuanbo Liu We have hundreds of kafka topics need to be consumed in one application. The application master will throw OOM exception after hanging for nearly half of an hour. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24711) Integration tests will not work with exclude/include tags
[ https://issues.apache.org/jira/browse/SPARK-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-24711: Priority: Minor (was: Major) > Integration tests will not work with exclude/include tags > - > > Key: SPARK-24711 > URL: https://issues.apache.org/jira/browse/SPARK-24711 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.1 >Reporter: Stavros Kontopoulos >Priority: Minor > Fix For: 2.4.0 > > > I tried to exclude some tests when adding mine and I got errors of the form: > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 6.798 s > [INFO] Finished at: 2018-07-01T18:34:13+03:00 > [INFO] Final Memory: 36M/652M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on > project spark-kubernetes-integration-tests_2.11: There are test failures. > [ERROR] > [ERROR] Please refer to > /home/stavros/Desktop/workspace/OSS/spark/resource-managers/kubernetes/integration-tests/target/surefire-reports > for the individual test results. > [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, > [date].dumpstream and [date]-jvmRun[N].dumpstream. > [ERROR] There was an error in the forked process > [ERROR] Unable to load category: noDcos > > This will not happen if maven surfire plugin is disabled as stated here: > [http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin] > I will create a PR shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21809) Change Stage Page to use datatables to support sorting columns and searching
[ https://issues.apache.org/jira/browse/SPARK-21809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529206#comment-16529206 ] Apache Spark commented on SPARK-21809: -- User 'pgandhi999' has created a pull request for this issue: https://github.com/apache/spark/pull/21688 > Change Stage Page to use datatables to support sorting columns and searching > > > Key: SPARK-21809 > URL: https://issues.apache.org/jira/browse/SPARK-21809 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.2.0 >Reporter: Nuochen Lyu >Priority: Minor > > Support column sort and search for Stage Server using jQuery DataTable and > REST API. Before this commit, the Stage page was generated hard-coded HTML > and can not support search, also, the sorting was disabled if there is any > application that has more than one attempt. Supporting search and sort (over > all applications rather than the 20 entries in the current page) in any case > will greatly improve the user experience. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24712) TrainValidationSplit ignores label column name and forces to be "label"
[ https://issues.apache.org/jira/browse/SPARK-24712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo J. Villacorta updated SPARK-24712: Description: When a TrainValidationSplit is fit on a Pipeline containing a ML model, the labelCol property of the model is ignored, and the call to fit() will fail unless the labelCol equals "label". As an example, the following pyspark code only works when the variable labelColumn is set to "label" {code:java} from pyspark.sql.functions import rand, randn from pyspark.ml.regression import LinearRegression labelColumn = "target" # CHANGE THIS TO "label" AND THE CODE WORKS df = spark.range(0, 10).select(rand(seed=10).alias("uniform"), randn(seed=27).alias(labelColumn)) vectorAssembler = VectorAssembler().setInputCols(["uniform"]).setOutputCol("features") lr = LinearRegression().setFeaturesCol("features").setLabelCol(labelColumn) mypipeline = Pipeline(stages = [vectorAssembler, lr]) paramGrid = ParamGridBuilder()\ .addGrid(lr.regParam, [0.01, 0.1])\ .build() trainValidationSplit = TrainValidationSplit()\ .setEstimator(mypipeline)\ .setEvaluator(RegressionEvaluator())\ .setEstimatorParamMaps(paramGrid)\ .setTrainRatio(0.8) trainValidationSplit.fit(df) # FAIL UNLESS labelColumn IS SET TO "label" {code} was: When a TrainValidationSplit is fit on a Pipeline containing a ML model, the labelCol property of the model is ignored, and the call to fit() will fail unless the labelCol equals "label". As an example, the following pyspark code only wors when the variable labelColumn is set to "label" {code:java} from pyspark.sql.functions import rand, randn from pyspark.ml.regression import LinearRegression labelColumn = "target" # CHANGE THIS TO "label" AND THE CODE WORKS df = spark.range(0, 10).select(rand(seed=10).alias("uniform"), randn(seed=27).alias(labelColumn)) vectorAssembler = VectorAssembler().setInputCols(["uniform"]).setOutputCol("features") lr = LinearRegression().setFeaturesCol("features").setLabelCol(labelColumn) mypipeline = Pipeline(stages = [vectorAssembler, lr]) paramGrid = ParamGridBuilder()\ .addGrid(lr.regParam, [0.01, 0.1])\ .build() trainValidationSplit = TrainValidationSplit()\ .setEstimator(mypipeline)\ .setEvaluator(RegressionEvaluator())\ .setEstimatorParamMaps(paramGrid)\ .setTrainRatio(0.8) trainValidationSplit.fit(df) # FAIL UNLESS labelColumn IS SET TO "label" {code} > TrainValidationSplit ignores label column name and forces to be "label" > --- > > Key: SPARK-24712 > URL: https://issues.apache.org/jira/browse/SPARK-24712 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.2.0 >Reporter: Pablo J. Villacorta >Priority: Major > > When a TrainValidationSplit is fit on a Pipeline containing a ML model, the > labelCol property of the model is ignored, and the call to fit() will fail > unless the labelCol equals "label". As an example, the following pyspark code > only works when the variable labelColumn is set to "label" > {code:java} > from pyspark.sql.functions import rand, randn > from pyspark.ml.regression import LinearRegression > labelColumn = "target" # CHANGE THIS TO "label" AND THE CODE WORKS > df = spark.range(0, 10).select(rand(seed=10).alias("uniform"), > randn(seed=27).alias(labelColumn)) > vectorAssembler = > VectorAssembler().setInputCols(["uniform"]).setOutputCol("features") > lr = LinearRegression().setFeaturesCol("features").setLabelCol(labelColumn) > mypipeline = Pipeline(stages = [vectorAssembler, lr]) > paramGrid = ParamGridBuilder()\ > .addGrid(lr.regParam, [0.01, 0.1])\ > .build() > trainValidationSplit = TrainValidationSplit()\ > .setEstimator(mypipeline)\ > .setEvaluator(RegressionEvaluator())\ > .setEstimatorParamMaps(paramGrid)\ > .setTrainRatio(0.8) > trainValidationSplit.fit(df) # FAIL UNLESS labelColumn IS SET TO "label" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24712) TrainValidationSplit ignores label column name and forces to be "label"
Pablo J. Villacorta created SPARK-24712: --- Summary: TrainValidationSplit ignores label column name and forces to be "label" Key: SPARK-24712 URL: https://issues.apache.org/jira/browse/SPARK-24712 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.2.0 Reporter: Pablo J. Villacorta When a TrainValidationSplit is fit on a Pipeline containing a ML model, the labelCol property of the model is ignored, and the call to fit() will fail unless the labelCol equals "label". As an example, the following pyspark code only wors when the variable labelColumn is set to "label" {code:java} from pyspark.sql.functions import rand, randn from pyspark.ml.regression import LinearRegression labelColumn = "target" # CHANGE THIS TO "label" AND THE CODE WORKS df = spark.range(0, 10).select(rand(seed=10).alias("uniform"), randn(seed=27).alias(labelColumn)) vectorAssembler = VectorAssembler().setInputCols(["uniform"]).setOutputCol("features") lr = LinearRegression().setFeaturesCol("features").setLabelCol(labelColumn) mypipeline = Pipeline(stages = [vectorAssembler, lr]) paramGrid = ParamGridBuilder()\ .addGrid(lr.regParam, [0.01, 0.1])\ .build() trainValidationSplit = TrainValidationSplit()\ .setEstimator(mypipeline)\ .setEvaluator(RegressionEvaluator())\ .setEstimatorParamMaps(paramGrid)\ .setTrainRatio(0.8) trainValidationSplit.fit(df) # FAIL UNLESS labelColumn IS SET TO "label" {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24711) Integration tests will not work with exclude/include tags
[ https://issues.apache.org/jira/browse/SPARK-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-24711: Description: I tried to exclude some tests when adding mine and I got errors of the form: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 6.798 s [INFO] Finished at: 2018-07-01T18:34:13+03:00 [INFO] Final Memory: 36M/652M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on project spark-kubernetes-integration-tests_2.11: There are test failures. [ERROR] [ERROR] Please refer to /home/stavros/Desktop/workspace/OSS/spark/resource-managers/kubernetes/integration-tests/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream. [ERROR] There was an error in the forked process [ERROR] Unable to load category: noDcos This will not happen if maven surfire plugin is disabled as stated here: [http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin] I will create a PR shortly. was: I tried to exclude some tests when adding mine and I got errors of the form: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 6.798 s [INFO] Finished at: 2018-07-01T18:34:13+03:00 [INFO] Final Memory: 36M/652M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on project spark-kubernetes-integration-tests_2.11: There are test failures. [ERROR] [ERROR] Please refer to /home/stavros/Desktop/workspace/OSS/spark/resource-managers/kubernetes/integration-tests/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream. [ERROR] There was an error in the forked process [ERROR] Unable to load category: noDcos This will not happen if maven surfire plugin is disabled as stated here: [http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin] > Integration tests will not work with exclude/include tags > - > > Key: SPARK-24711 > URL: https://issues.apache.org/jira/browse/SPARK-24711 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.1 >Reporter: Stavros Kontopoulos >Priority: Major > Fix For: 2.4.0 > > > I tried to exclude some tests when adding mine and I got errors of the form: > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 6.798 s > [INFO] Finished at: 2018-07-01T18:34:13+03:00 > [INFO] Final Memory: 36M/652M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on > project spark-kubernetes-integration-tests_2.11: There are test failures. > [ERROR] > [ERROR] Please refer to > /home/stavros/Desktop/workspace/OSS/spark/resource-managers/kubernetes/integration-tests/target/surefire-reports > for the individual test results. > [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, > [date].dumpstream and [date]-jvmRun[N].dumpstream. > [ERROR] There was an error in the forked process > [ERROR] Unable to load category: noDcos > > This will not happen if maven surfire plugin is disabled as stated here: > [http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin] > I will create a PR shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24711) Integration tests will not work with exclude/include tags
[ https://issues.apache.org/jira/browse/SPARK-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-24711: Description: I tried to exclude some tests when adding mine and I got errors of the form: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 6.798 s [INFO] Finished at: 2018-07-01T18:34:13+03:00 [INFO] Final Memory: 36M/652M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on project spark-kubernetes-integration-tests_2.11: There are test failures. [ERROR] [ERROR] Please refer to /home/stavros/Desktop/workspace/OSS/spark/resource-managers/kubernetes/integration-tests/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream. [ERROR] There was an error in the forked process [ERROR] Unable to load category: noDcos This will not happen if maven surfire plugin is disabled as stated here: [http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin] was: I tried to exclude some tests when adding mine and I got errors of the form: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 6.798 s [INFO] Finished at: 2018-07-01T18:34:13+03:00 [INFO] Final Memory: 36M/652M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on project spark-kubernetes-integration-tests_2.11: There are test failures. [ERROR] [ERROR] Please refer to /home/stavros/Desktop/workspace/OSS/spark/resource-managers/kubernetes/integration-tests/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream. [ERROR] There was an error in the forked process [ERROR] Unable to load category: noDcos This will not happen if maven surfire plugin is disabled as stated here: http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin > Integration tests will not work with exclude/include tags > - > > Key: SPARK-24711 > URL: https://issues.apache.org/jira/browse/SPARK-24711 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.1 >Reporter: Stavros Kontopoulos >Priority: Major > Fix For: 2.4.0 > > > I tried to exclude some tests when adding mine and I got errors of the form: > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 6.798 s > [INFO] Finished at: 2018-07-01T18:34:13+03:00 > [INFO] Final Memory: 36M/652M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on > project spark-kubernetes-integration-tests_2.11: There are test failures. > [ERROR] > [ERROR] Please refer to > /home/stavros/Desktop/workspace/OSS/spark/resource-managers/kubernetes/integration-tests/target/surefire-reports > for the individual test results. > [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, > [date].dumpstream and [date]-jvmRun[N].dumpstream. > [ERROR] There was an error in the forked process > [ERROR] Unable to load category: noDcos > > This will not happen if maven surfire plugin is disabled as stated here: > [http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24711) Integration tests will not work with exclude/include tags
Stavros Kontopoulos created SPARK-24711: --- Summary: Integration tests will not work with exclude/include tags Key: SPARK-24711 URL: https://issues.apache.org/jira/browse/SPARK-24711 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.3.1 Reporter: Stavros Kontopoulos Fix For: 2.4.0 I tried to exclude some tests when adding mine and I got errors of the form: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 6.798 s [INFO] Finished at: 2018-07-01T18:34:13+03:00 [INFO] Final Memory: 36M/652M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on project spark-kubernetes-integration-tests_2.11: There are test failures. [ERROR] [ERROR] Please refer to /home/stavros/Desktop/workspace/OSS/spark/resource-managers/kubernetes/integration-tests/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream. [ERROR] There was an error in the forked process [ERROR] Unable to load category: noDcos This will not happen if maven surfire plugin is disabled as stated here: http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24165) UDF within when().otherwise() raises NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529108#comment-16529108 ] Apache Spark commented on SPARK-24165: -- User 'mn-mikke' has created a pull request for this issue: https://github.com/apache/spark/pull/21687 > UDF within when().otherwise() raises NullPointerException > - > > Key: SPARK-24165 > URL: https://issues.apache.org/jira/browse/SPARK-24165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jingxuan Wang >Priority: Major > > I have a UDF which takes java.sql.Timestamp and String as input column type > and returns an Array of (Seq[case class], Double) as output. Since some of > values in input columns can be nullable, I put the UDF inside a > when($input.isNull, null).otherwise(UDF) filter. Such function works well > when I test in spark shell. But running as a scala jar in spark-submit with > yarn cluster mode, it raised NullPointerException which points to the UDF > function. If I remove the when().otherwsie() condition, but put null check > inside the UDF, the function works without issue in spark-submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24165) UDF within when().otherwise() raises NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24165: Assignee: Apache Spark > UDF within when().otherwise() raises NullPointerException > - > > Key: SPARK-24165 > URL: https://issues.apache.org/jira/browse/SPARK-24165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jingxuan Wang >Assignee: Apache Spark >Priority: Major > > I have a UDF which takes java.sql.Timestamp and String as input column type > and returns an Array of (Seq[case class], Double) as output. Since some of > values in input columns can be nullable, I put the UDF inside a > when($input.isNull, null).otherwise(UDF) filter. Such function works well > when I test in spark shell. But running as a scala jar in spark-submit with > yarn cluster mode, it raised NullPointerException which points to the UDF > function. If I remove the when().otherwsie() condition, but put null check > inside the UDF, the function works without issue in spark-submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24165) UDF within when().otherwise() raises NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24165: Assignee: (was: Apache Spark) > UDF within when().otherwise() raises NullPointerException > - > > Key: SPARK-24165 > URL: https://issues.apache.org/jira/browse/SPARK-24165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jingxuan Wang >Priority: Major > > I have a UDF which takes java.sql.Timestamp and String as input column type > and returns an Array of (Seq[case class], Double) as output. Since some of > values in input columns can be nullable, I put the UDF inside a > when($input.isNull, null).otherwise(UDF) filter. Such function works well > when I test in spark shell. But running as a scala jar in spark-submit with > yarn cluster mode, it raised NullPointerException which points to the UDF > function. If I remove the when().otherwsie() condition, but put null check > inside the UDF, the function works without issue in spark-submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24710) Information Gain Ratio for decision trees
Pablo J. Villacorta created SPARK-24710: --- Summary: Information Gain Ratio for decision trees Key: SPARK-24710 URL: https://issues.apache.org/jira/browse/SPARK-24710 Project: Spark Issue Type: New Feature Components: ML Affects Versions: 2.3.1 Reporter: Pablo J. Villacorta Fix For: 2.3.1 Spark currently uses Information Gain (IG) to decide the next feature to branch on when building a decision tree. In case of categorical features, IG is known to be biased towards features with a large number of categories. [Information Gain Ratio|https://en.wikipedia.org/wiki/Information_gain_ratio] solves this problem by dividing the IG by a number that characterizes the intrinsic information of a feature. As far as I know, Spark has IG but not IGR. It would be nice to have the possibility to choose IGR instead of IG. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24642) Add a function which infers schema from a JSON column
[ https://issues.apache.org/jira/browse/SPARK-24642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529030#comment-16529030 ] Maxim Gekk edited comment on SPARK-24642 at 7/1/18 10:05 AM: - [~rxin] I created new ticket SPARK-24709 which aims to add simpler function. Here is the PR https://github.com/apache/spark/pull/21686 for the ticket. was (Author: maxgekk): I created new ticket SPARK-24709 which aims to add simpler function. > Add a function which infers schema from a JSON column > - > > Key: SPARK-24642 > URL: https://issues.apache.org/jira/browse/SPARK-24642 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Minor > > Need to add new aggregate function - *infer_schema()*. The function should > infer schema for set of JSON strings. The result of the function is a schema > in DDL format (or JSON format). > One of the use cases is passing output of *infer_schema()* to *from_json()*. > Currently, the from_json() function requires a schema as a mandatory > argument. It is possible to infer schema programmatically in Scala/Python and > pass it as the second argument but in SQL it is not possible. An user has to > pass schema as string literal in SQL. The new function should allow to use it > in SQL like in the example: > {code:sql} > select from_json(json_col, infer_schema(json_col)) > from json_table; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24709) Inferring schema from JSON string literal
[ https://issues.apache.org/jira/browse/SPARK-24709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24709: Assignee: Apache Spark > Inferring schema from JSON string literal > - > > Key: SPARK-24709 > URL: https://issues.apache.org/jira/browse/SPARK-24709 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Minor > > Need to add new function - *schema_of_json()*. The function should infer > schema of JSON string literal. The result of the function is a schema in DDL > format. > One of the use cases is passing output of _schema_of_json()_ to > *from_json()*. Currently, the _from_json()_ function requires a schema as a > mandatory argument. An user has to pass a schema as string literal in SQL. > The new function should allow schema inferring from an example. Let's say > json_col is a column containing JSON string with the same schema. It should > be possible to pass a JSON string with the same schema to _schema_of_json()_ > which infers schema for the particular example. > {code:sql} > select from_json(json_col, schema_of_json('{"f1": 0, "f2": [0], "f2": "a"}')) > from json_table; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24709) Inferring schema from JSON string literal
[ https://issues.apache.org/jira/browse/SPARK-24709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24709: Assignee: (was: Apache Spark) > Inferring schema from JSON string literal > - > > Key: SPARK-24709 > URL: https://issues.apache.org/jira/browse/SPARK-24709 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Minor > > Need to add new function - *schema_of_json()*. The function should infer > schema of JSON string literal. The result of the function is a schema in DDL > format. > One of the use cases is passing output of _schema_of_json()_ to > *from_json()*. Currently, the _from_json()_ function requires a schema as a > mandatory argument. An user has to pass a schema as string literal in SQL. > The new function should allow schema inferring from an example. Let's say > json_col is a column containing JSON string with the same schema. It should > be possible to pass a JSON string with the same schema to _schema_of_json()_ > which infers schema for the particular example. > {code:sql} > select from_json(json_col, schema_of_json('{"f1": 0, "f2": [0], "f2": "a"}')) > from json_table; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24709) Inferring schema from JSON string literal
[ https://issues.apache.org/jira/browse/SPARK-24709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529038#comment-16529038 ] Apache Spark commented on SPARK-24709: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/21686 > Inferring schema from JSON string literal > - > > Key: SPARK-24709 > URL: https://issues.apache.org/jira/browse/SPARK-24709 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Minor > > Need to add new function - *schema_of_json()*. The function should infer > schema of JSON string literal. The result of the function is a schema in DDL > format. > One of the use cases is passing output of _schema_of_json()_ to > *from_json()*. Currently, the _from_json()_ function requires a schema as a > mandatory argument. An user has to pass a schema as string literal in SQL. > The new function should allow schema inferring from an example. Let's say > json_col is a column containing JSON string with the same schema. It should > be possible to pass a JSON string with the same schema to _schema_of_json()_ > which infers schema for the particular example. > {code:sql} > select from_json(json_col, schema_of_json('{"f1": 0, "f2": [0], "f2": "a"}')) > from json_table; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24642) Add a function which infers schema from a JSON column
[ https://issues.apache.org/jira/browse/SPARK-24642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529030#comment-16529030 ] Maxim Gekk commented on SPARK-24642: I created new ticket SPARK-24709 which aims to add simpler function. > Add a function which infers schema from a JSON column > - > > Key: SPARK-24642 > URL: https://issues.apache.org/jira/browse/SPARK-24642 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Minor > > Need to add new aggregate function - *infer_schema()*. The function should > infer schema for set of JSON strings. The result of the function is a schema > in DDL format (or JSON format). > One of the use cases is passing output of *infer_schema()* to *from_json()*. > Currently, the from_json() function requires a schema as a mandatory > argument. It is possible to infer schema programmatically in Scala/Python and > pass it as the second argument but in SQL it is not possible. An user has to > pass schema as string literal in SQL. The new function should allow to use it > in SQL like in the example: > {code:sql} > select from_json(json_col, infer_schema(json_col)) > from json_table; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24642) Add a function which infers schema from a JSON column
[ https://issues.apache.org/jira/browse/SPARK-24642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk resolved SPARK-24642. Resolution: Won't Fix > Add a function which infers schema from a JSON column > - > > Key: SPARK-24642 > URL: https://issues.apache.org/jira/browse/SPARK-24642 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Priority: Minor > > Need to add new aggregate function - *infer_schema()*. The function should > infer schema for set of JSON strings. The result of the function is a schema > in DDL format (or JSON format). > One of the use cases is passing output of *infer_schema()* to *from_json()*. > Currently, the from_json() function requires a schema as a mandatory > argument. It is possible to infer schema programmatically in Scala/Python and > pass it as the second argument but in SQL it is not possible. An user has to > pass schema as string literal in SQL. The new function should allow to use it > in SQL like in the example: > {code:sql} > select from_json(json_col, infer_schema(json_col)) > from json_table; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24709) Inferring schema from JSON string literal
Maxim Gekk created SPARK-24709: -- Summary: Inferring schema from JSON string literal Key: SPARK-24709 URL: https://issues.apache.org/jira/browse/SPARK-24709 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.1 Reporter: Maxim Gekk Need to add new function - *schema_of_json()*. The function should infer schema of JSON string literal. The result of the function is a schema in DDL format. One of the use cases is passing output of _schema_of_json()_ to *from_json()*. Currently, the _from_json()_ function requires a schema as a mandatory argument. An user has to pass a schema as string literal in SQL. The new function should allow schema inferring from an example. Let's say json_col is a column containing JSON string with the same schema. It should be possible to pass a JSON string with the same schema to _schema_of_json()_ which infers schema for the particular example. {code:sql} select from_json(json_col, schema_of_json('{"f1": 0, "f2": [0], "f2": "a"}')) from json_table; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24621) WebUI - application 'name' urls point to http instead of https (even when ssl enabled)
[ https://issues.apache.org/jira/browse/SPARK-24621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528572#comment-16528572 ] t oo edited comment on SPARK-24621 at 7/1/18 9:36 AM: -- [https://github.com/apache/spark/pull/21514/commits] [core/src/main/scala/org/apache/spark/deploy/master/Master.scala|https://github.com/apache/spark#diff-29dffdccd5a7f4c8b496c293e87c8668] val SSL_ENABLED = conf.getBoolean("spark.ssl.enabled", false) var uriScheme = "http://"; if (SSL_ENABLED) { uriScheme = "https://"; } masterWebUiUrl = uriScheme + masterPublicAddress + ":" + webUi.boundPort //masterWebUiUrl = "http://"; + masterPublicAddress + ":" + webUi.boundPort was (Author: toopt4): [https://github.com/apache/spark/pull/21514/commits] [core/src/main/scala/org/apache/spark/deploy/master/Master.scala|https://github.com/apache/spark#diff-29dffdccd5a7f4c8b496c293e87c8668] val SSL_ENABLED = conf.getBoolean("spark.ssl.enabled", false) val uriScheme = "http://"; if (SSL_ENABLED) { uriScheme = "https://"; } masterWebUiUrl = uriScheme + masterPublicAddress + ":" + webUi.boundPort //masterWebUiUrl = "http://"; + masterPublicAddress + ":" + webUi.boundPort > WebUI - application 'name' urls point to http instead of https (even when ssl > enabled) > -- > > Key: SPARK-24621 > URL: https://issues.apache.org/jira/browse/SPARK-24621 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: t oo >Priority: Major > Attachments: spark_master-one-app.png > > > See attached > ApplicationID correctly points to DNS url > but Name points to IP address > Update: I found setting SPARK_PUBLIC_DNS to DNS hostname will make Name point > to DNS. BUT it will use http instead of https! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24708) Document the default spark url of master in standalone is "spark://localhost:7070"
Chia-Ping Tsai created SPARK-24708: -- Summary: Document the default spark url of master in standalone is "spark://localhost:7070" Key: SPARK-24708 URL: https://issues.apache.org/jira/browse/SPARK-24708 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.3.1 Reporter: Chia-Ping Tsai In the section "Starting a Cluster Manually" we give a example of starting a worker. {code:java} ./sbin/start-slave.sh {code} However, we only mention the default "web port" so readers may be misled into using the "web port" to start the worker. (of course, I am a "reader" too :() It seems to me that adding a bit description of default spark url of master can avoid above ambiguity. for example: {code:java} - Similarly, you can start one or more workers and connect them to the master via: + Similarly, you can start one or more workers and connect them to the master's spark URL (default: spark://:7070) via:{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529013#comment-16529013 ] Yuming Wang commented on SPARK-20427: - [~ORichard]. Please try to use {{customSchema}} to specifying the custom data types of the read schema. https://github.com/apache/spark/blob/v2.3.1/examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala#L197 > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529005#comment-16529005 ] Oliver Richardson commented on SPARK-20427: --- I'm still getting the same problem even in newest version!. > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org