[jira] [Closed] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner closed SPARK-10890. Resolution: Fixed Fix Version/s: 2.2.0 2.1.1 This is issue is no longer reproducible. It still happened in v2.1.0-rc5 > "Column count does not match; SQL statement:" error in JDBCWriteSuite > - > > Key: SPARK-10890 > URL: https://issues.apache.org/jira/browse/SPARK-10890 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Rick Hillegas > Fix For: 2.1.1, 2.2.0 > > > I get the following error when I run the following test... > mvn -Dhadoop.version=2.4.0 > -DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test > {noformat} > JDBCWriteSuite: > 13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name > DAGScheduler for source because spark.app.id is not set. > - Basic CREATE > - CREATE with overwrite > - CREATE then INSERT to append > - CREATE then INSERT to truncate > 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 > in stage 23.0 (TID 31) > org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: > INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] > at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) > at org.h2.message.DbException.get(DbException.java:179) > at org.h2.message.DbException.get(DbException.java:155) > at org.h2.message.DbException.get(DbException.java:144) > at org.h2.command.dml.Insert.prepare(Insert.java:265) > at org.h2.command.Parser.prepareCommand(Parser.java:247) > at org.h2.engine.Session.prepareLocal(Session.java:446) > at org.h2.engine.Session.prepareCommand(Session.java:388) > at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189) > at > org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72) > at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 > in stage 23.0 (TID 32) > org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: > INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] > at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) > at org.h2.message.DbException.get(DbException.java:179) > at org.h2.message.DbException.get(DbException.java:155) > at org.h2.message.DbException.get(DbException.java:144) > at org.h2.command.dml.Insert.prepare(Insert.java:265) > at org.h2.command.Parser.prepareCommand(Parser.java:247) > at org.h2.engine.Session.prepareLocal(Session.java:446) > at org.h2.engine.Session.prepareCommand(Session.java:388) > at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189) > at > org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72) > at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100) > at >
[jira] [Updated] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-10890: - Description: I get the following error when I run the following test... mvn -Dhadoop.version=2.4.0 -DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test {noformat} JDBCWriteSuite: 13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. - Basic CREATE - CREATE with overwrite - CREATE then INSERT to append - CREATE then INSERT to truncate 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 23.0 (TID 31) org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) at org.h2.message.DbException.get(DbException.java:179) at org.h2.message.DbException.get(DbException.java:155) at org.h2.message.DbException.get(DbException.java:144) at org.h2.command.dml.Insert.prepare(Insert.java:265) at org.h2.command.Parser.prepareCommand(Parser.java:247) at org.h2.engine.Session.prepareLocal(Session.java:446) at org.h2.engine.Session.prepareCommand(Session.java:388) at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189) at org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72) at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in stage 23.0 (TID 32) org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) at org.h2.message.DbException.get(DbException.java:179) at org.h2.message.DbException.get(DbException.java:155) at org.h2.message.DbException.get(DbException.java:144) at org.h2.command.dml.Insert.prepare(Insert.java:265) at org.h2.command.Parser.prepareCommand(Parser.java:247) at org.h2.engine.Session.prepareLocal(Session.java:446) at org.h2.engine.Session.prepareCommand(Session.java:388) at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189) at org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72) at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) at
[jira] [Updated] (SPARK-17803) Docker integration tests don't run with "Docker for Mac"
[ https://issues.apache.org/jira/browse/SPARK-17803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-17803: - Description: The {{docker-integration-tests}} can be run on Mac OS X with _[Docker Toolbox|https://docs.docker.com/toolbox/overview/]_. However _Docker Toolbox_ comes with a cumbersome setup (VirtualBox VM running a _boot2docker_ Linux distro, tricky to do port mapping, ...). A much preferable way to work with Docker on Mac OS X it is to use the _[Docker for Mac|https://docs.docker.com/docker-for-mac/docker-toolbox/]_ native application but Spark's {{docker-integration-tests}} don't run with _Docker for Mac_. The [work-around|https://github.com/spotify/docker-client#a-note-on-using-docker-for-mac] to set {{DOCKER_HOST=unix:///var/run/docker.sock}} is not the most straight forward find. We should upgrade the [spotify docker-client|https://mvnrepository.com/artifact/com.spotify/docker-client] dependency from 3.6.6 to 4.0.8 or later which supports "Docker for Mac" out of the box. was: The {{docker-integration-tests}} can be run on Mac OS X with _[Docker Toolbox|https://docs.docker.com/toolbox/overview/]_. However _Docker Toolbox_ comes with a cumbersome setup (VirtualBox VM running boot2docker Linux distro, tricky to do port mapping, ...). A much preferable way to work with Docker on Mac OS X it is to use the _[Docker for Mac|https://docs.docker.com/docker-for-mac/docker-toolbox/]_ native application but Spark's {{docker-integration-tests}} don't run with _Docker for Mac_. The [work-around|https://github.com/spotify/docker-client#a-note-on-using-docker-for-mac] to set {{DOCKER_HOST=unix:///var/run/docker.sock}} is not the most straight forward find. We should upgrade the [spotify docker-client|https://mvnrepository.com/artifact/com.spotify/docker-client] dependency from 3.6.6 to 4.0.8 or later which supports "Docker for Mac" out of the box. > Docker integration tests don't run with "Docker for Mac" > > > Key: SPARK-17803 > URL: https://issues.apache.org/jira/browse/SPARK-17803 > Project: Spark > Issue Type: Dependency upgrade > Components: Tests >Affects Versions: 2.0.1 > Environment: Mac OS X 10.10.5, Docker for Mac >Reporter: Christian Kadner >Priority: Minor > > The {{docker-integration-tests}} can be run on Mac OS X with _[Docker > Toolbox|https://docs.docker.com/toolbox/overview/]_. However _Docker Toolbox_ > comes with a cumbersome setup (VirtualBox VM running a _boot2docker_ Linux > distro, tricky to do port mapping, ...). > A much preferable way to work with Docker on Mac OS X it is to use the > _[Docker for Mac|https://docs.docker.com/docker-for-mac/docker-toolbox/]_ > native application but Spark's {{docker-integration-tests}} don't run with > _Docker for Mac_. The > [work-around|https://github.com/spotify/docker-client#a-note-on-using-docker-for-mac] > to set {{DOCKER_HOST=unix:///var/run/docker.sock}} is not the most straight > forward find. > We should upgrade the [spotify > docker-client|https://mvnrepository.com/artifact/com.spotify/docker-client] > dependency from 3.6.6 to 4.0.8 or later which supports "Docker for Mac" out > of the box. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17803) Docker integration tests don't run with "Docker for Mac"
Christian Kadner created SPARK-17803: Summary: Docker integration tests don't run with "Docker for Mac" Key: SPARK-17803 URL: https://issues.apache.org/jira/browse/SPARK-17803 Project: Spark Issue Type: Dependency upgrade Components: Tests Affects Versions: 2.0.1 Environment: Mac OS X 10.10.5, Docker for Mac Reporter: Christian Kadner Priority: Minor The {{docker-integration-tests}} can be run on Mac OS X with _[Docker Toolbox|https://docs.docker.com/toolbox/overview/]_. However _Docker Toolbox_ comes with a cumbersome setup (VirtualBox VM running boot2docker Linux distro, tricky to do port mapping, ...). A much preferable way to work with Docker on Mac OS X it is to use the _[Docker for Mac|https://docs.docker.com/docker-for-mac/docker-toolbox/]_ native application but Spark's {{docker-integration-tests}} don't run with _Docker for Mac_. The [work-around|https://github.com/spotify/docker-client#a-note-on-using-docker-for-mac] to set {{DOCKER_HOST=unix:///var/run/docker.sock}} is not the most straight forward find. We should upgrade the [spotify docker-client|https://mvnrepository.com/artifact/com.spotify/docker-client] dependency from 3.6.6 to 4.0.8 or later which supports "Docker for Mac" out of the box. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979644#comment-14979644 ] Christian Kadner commented on SPARK-10890: -- Hi [~rhillegas] since the `org.h2.jdbc.JdbcSQLException` is logged by `org.apache.spark.executor.Executor` and again by `org.apache.spark.scheduler.TaskSetManager`, we could try to influence their log behavior / log levels. The easiest way to suppress the logging of the expected exception is to temporarily increase the log threshold to `FATAL` while the test case is executed and restore the original log level right after. I realize this is a _"hack"_ with the side effect of suppressing any other log message of less than FATAL severity during the test case execution. Discussion invited :-) > "Column count does not match; SQL statement:" error in JDBCWriteSuite > - > > Key: SPARK-10890 > URL: https://issues.apache.org/jira/browse/SPARK-10890 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Rick Hillegas > > I get the following error when I run the following test... > mvn -Dhadoop.version=2.4.0 > -DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test > {noformat} > JDBCWriteSuite: > 13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name > DAGScheduler for source because spark.app.id is not set. > - Basic CREATE > - CREATE with overwrite > - CREATE then INSERT to append > - CREATE then INSERT to truncate > 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 > in stage 23.0 (TID 31) > org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: > INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] > at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) > at org.h2.message.DbException.get(DbException.java:179) > at org.h2.message.DbException.get(DbException.java:155) > at org.h2.message.DbException.get(DbException.java:144) > at org.h2.command.dml.Insert.prepare(Insert.java:265) > at org.h2.command.Parser.prepareCommand(Parser.java:247) > at org.h2.engine.Session.prepareLocal(Session.java:446) > at org.h2.engine.Session.prepareCommand(Session.java:388) > at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189) > at > org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72) > at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 > in stage 23.0 (TID 32) > org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: > INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] > at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) > at org.h2.message.DbException.get(DbException.java:179) > at org.h2.message.DbException.get(DbException.java:155) > at org.h2.message.DbException.get(DbException.java:144) > at org.h2.command.dml.Insert.prepare(Insert.java:265) > at org.h2.command.Parser.prepareCommand(Parser.java:247) > at org.h2.engine.Session.prepareLocal(Session.java:446) > at org.h2.engine.Session.prepareCommand(Session.java:388) > at
[jira] [Comment Edited] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979644#comment-14979644 ] Christian Kadner edited comment on SPARK-10890 at 10/29/15 1:52 AM: Hi [~rhillegas] since the {{org.h2.jdbc.JdbcSQLException}} is logged by {{org.apache.spark.executor.Executor}} and again by {{org.apache.spark.scheduler.TaskSetManager}}, we could try to influence their log behavior / log levels. The easiest way to suppress the logging of the expected exception is to temporarily increase the log threshold to {{FATAL}} while the test case is executed and restore the original log level right after. I realize this is a _"hack"_ with the side effect of suppressing any other log message of less than FATAL severity during the test case execution. Discussion invited :-) was (Author: ckadner): Hi [~rhillegas] since the `org.h2.jdbc.JdbcSQLException` is logged by `org.apache.spark.executor.Executor` and again by `org.apache.spark.scheduler.TaskSetManager`, we could try to influence their log behavior / log levels. The easiest way to suppress the logging of the expected exception is to temporarily increase the log threshold to `FATAL` while the test case is executed and restore the original log level right after. I realize this is a _"hack"_ with the side effect of suppressing any other log message of less than FATAL severity during the test case execution. Discussion invited :-) > "Column count does not match; SQL statement:" error in JDBCWriteSuite > - > > Key: SPARK-10890 > URL: https://issues.apache.org/jira/browse/SPARK-10890 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Rick Hillegas > > I get the following error when I run the following test... > mvn -Dhadoop.version=2.4.0 > -DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test > {noformat} > JDBCWriteSuite: > 13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name > DAGScheduler for source because spark.app.id is not set. > - Basic CREATE > - CREATE with overwrite > - CREATE then INSERT to append > - CREATE then INSERT to truncate > 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 > in stage 23.0 (TID 31) > org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement: > INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183] > at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) > at org.h2.message.DbException.get(DbException.java:179) > at org.h2.message.DbException.get(DbException.java:155) > at org.h2.message.DbException.get(DbException.java:144) > at org.h2.command.dml.Insert.prepare(Insert.java:265) > at org.h2.command.Parser.prepareCommand(Parser.java:247) > at org.h2.engine.Session.prepareLocal(Session.java:446) > at org.h2.engine.Session.prepareCommand(Session.java:388) > at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189) > at > org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72) > at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 > in stage 23.0 (TID 32) >
[jira] [Updated] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)
[ https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-11338: - Description: Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export APPLICATION_WEB_PROXY_BASE=}}). This makes it impossible/unpractical to expose the *History Server* in a multi-tenancy environment where each Spark service instance has one history server behind a multi-tenant enabled proxy server. All other Spark web UI pages are correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} environment variable is set. *Repro steps:*\\ # Configure history log collection: {code:title=conf/spark-defaults.conf|borderStyle=solid} spark.eventLog.enabled true spark.eventLog.dir logs/history spark.history.fs.logDirectory logs/history {code} ...create the logs folders: {code} $ mkdir -p logs/history {code} # Start the Spark shell and run the word count example: {code:java|borderStyle=solid} $ bin/spark-shell ... scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 1)).reduceByKey(_ + _).collect scala> sc.stop {code} # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}): {code} $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. {code} # Start the history server: {code} $ sbin/start-history-server.sh {code} # Bring up the History Server web UI at {{localhost:18080}} and view the application link in the HTML source text: {code:xml|borderColor=#c00} ... App IDApp Name... local-1445896187531Spark shell ... {code} *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have the prefix {{/testwebuiproxy/..}} \\ \\ All site-relative links (URL starting with {{"/"}}) should have been prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ... {code:xml|borderColor=#0c0} ... App IDApp Name... local-1445896187531Spark shell ... {code} was: Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export APPLICATION_WEB_PROXY_BASE=}}). This makes it impossible/unpractical to expose the *History Server* in a multi-tenancy environment where each Spark service instance has one history server behind a multi-tenant enabled proxy server. All other Spark web UI pages are correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. *Repro steps:*\\ # Configure history log collection: {code:title=conf/spark-defaults.conf|borderStyle=solid} spark.eventLog.enabled true spark.eventLog.dir logs/history spark.history.fs.logDirectory logs/history {code} ...create the logs folders: {code} $ mkdir -p logs/history {code} # Start the Spark shell and run the word count example: {code:java|borderStyle=solid} $ bin/spark-shell ... scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 1)).reduceByKey(_ + _).collect scala> sc.stop {code} # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}): {code} $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. {code} # Start the history server: {code} $ sbin/start-history-server.sh {code} # Bring up the History Server web UI at {{localhost:18080}} and view the application link in the HTML source text: {code:xml|borderColor=#c00} ... App IDApp Name... local-1445896187531Spark shell ... {code} *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have the prefix {{/testwebuiproxy/..}} \\ \\ All site-relative links (URL starting with {{"/"}}) should have been prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ... {code:xml|borderColor=#0c0} ... App IDApp Name... local-1445896187531Spark shell ... {code} > HistoryPage not multi-tenancy enabled (app links not prefixed with > APPLICATION_WEB_PROXY_BASE) > -- > > Key: SPARK-11338 > URL: https://issues.apache.org/jira/browse/SPARK-11338 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Christian Kadner > Original Estimate: 48h > Remaining Estimate: 48h > > Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export > APPLICATION_WEB_PROXY_BASE=}}). This makes it > impossible/unpractical to expose the *History Server* in a multi-tenancy > environment where each Spark service instance has one history server behind a > multi-tenant enabled proxy server. All other Spark web UI pages are > correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} environment > variable is set. > *Repro steps:*\\ > # Configure history log collection: > {code:title=conf/spark-defaults.conf|borderStyle=solid} > spark.eventLog.enabled true > spark.eventLog.dir logs/history >
[jira] [Commented] (SPARK-4836) Web UI should display separate information for all stage attempts
[ https://issues.apache.org/jira/browse/SPARK-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977551#comment-14977551 ] Christian Kadner commented on SPARK-4836: - Hi [~joshrosen], is this still a problem? And if so, do you have a somewhat _"reliable"_ repro scenario or a nifty way to fake this: {quote}"...(job) lost some partitions of that stage and had to run a new stage attempt to recompute one or two tasks from that stage..."{quote} > Web UI should display separate information for all stage attempts > - > > Key: SPARK-4836 > URL: https://issues.apache.org/jira/browse/SPARK-4836 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.1.1, 1.2.0 >Reporter: Josh Rosen > > I've run into some cases where the web UI job page will say that a job took > 12 minutes but the sum of that job's stage times is something like 10 > seconds. In this case, it turns out that my job ran a stage to completion > (which took, say, 5 minutes) then lost some partitions of that stage and had > to run a new stage attempt to recompute one or two tasks from that stage. As > a result, the latest attempt for that stage reports only one or two tasks. > In the web UI, it seems that we only show the latest stage attempt, not all > attempts, which can lead to confusing / misleading displays for jobs with > failed / partially-recomputed stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)
Christian Kadner created SPARK-11338: Summary: HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE) Key: SPARK-11338 URL: https://issues.apache.org/jira/browse/SPARK-11338 Project: Spark Issue Type: Bug Components: Web UI Reporter: Christian Kadner Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export APPLICATION_WEB_PROXY_BASE=}}). This makes it impossible/unpractical to expose the *History Server* in a multi-tenancy environment where each Spark service instance has one history server behind a multi-tenant enabled proxy server. All other Spark web UI pages are correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. *Repro steps:*\\ # Configure history log collection: {code:title=conf/spark-defaults.conf|borderStyle=solid} spark.eventLog.enabled true spark.eventLog.dir logs/history spark.history.fs.logDirectory logs/history {code} ...create the logs folders: {code} $ mkdir -p logs/history {code} # Start the Spark shell and run the word count example: {code:java|borderStyle=solid} $ bin/spark-shell ... scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 1)).reduceByKey(_ + _).collect scala> sc.stop {code} # Set the web proxy root path path ({{/testwebuiproxy/..}}): {code} $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. {code} # Start the history server: {code} $ sbin/start-history-server.sh {code} # Bring up the History Server web UI at {{localhost:18080}} and view the application link in the HTML source text: {code:xml|borderColor=#c00} ... App IDApp Name... local-1445896187531Spark shell ... {code} *Notice*, application link does _not_ have the prefix {{/testwebuiproxy/..}} \\ \\ All site-relative links (URL starting with {{"/"}}) should have been prepended with the uiRoot prefix {color:red}{{/testwebuiproxy/..}}{color} like this ... {code:xml|borderColor=#0c0} ... App IDApp Name... local-1445896187531Spark shell ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)
[ https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-11338: - Description: Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export APPLICATION_WEB_PROXY_BASE=}}). This makes it impossible/unpractical to expose the *History Server* in a multi-tenancy environment where each Spark service instance has one history server behind a multi-tenant enabled proxy server. All other Spark web UI pages are correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. *Repro steps:*\\ # Configure history log collection: {code:title=conf/spark-defaults.conf|borderStyle=solid} spark.eventLog.enabled true spark.eventLog.dir logs/history spark.history.fs.logDirectory logs/history {code} ...create the logs folders: {code} $ mkdir -p logs/history {code} # Start the Spark shell and run the word count example: {code:java|borderStyle=solid} $ bin/spark-shell ... scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 1)).reduceByKey(_ + _).collect scala> sc.stop {code} # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}): {code} $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. {code} # Start the history server: {code} $ sbin/start-history-server.sh {code} # Bring up the History Server web UI at {{localhost:18080}} and view the application link in the HTML source text: {code:xml|borderColor=#c00} ... App IDApp Name... local-1445896187531Spark shell ... {code} *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have the prefix {{/testwebuiproxy/..}} \\ \\ All site-relative links (URL starting with {{"/"}}) should have been prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ... {code:xml|borderColor=#0c0} ... App IDApp Name... local-1445896187531Spark shell ... {code} was: Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export APPLICATION_WEB_PROXY_BASE=}}). This makes it impossible/unpractical to expose the *History Server* in a multi-tenancy environment where each Spark service instance has one history server behind a multi-tenant enabled proxy server. All other Spark web UI pages are correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. *Repro steps:*\\ # Configure history log collection: {code:title=conf/spark-defaults.conf|borderStyle=solid} spark.eventLog.enabled true spark.eventLog.dir logs/history spark.history.fs.logDirectory logs/history {code} ...create the logs folders: {code} $ mkdir -p logs/history {code} # Start the Spark shell and run the word count example: {code:java|borderStyle=solid} $ bin/spark-shell ... scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 1)).reduceByKey(_ + _).collect scala> sc.stop {code} # Set the web proxy root path path ({{/testwebuiproxy/..}}): {code} $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. {code} # Start the history server: {code} $ sbin/start-history-server.sh {code} # Bring up the History Server web UI at {{localhost:18080}} and view the application link in the HTML source text: {code:xml|borderColor=#c00} ... App IDApp Name... local-1445896187531Spark shell ... {code} *Notice*, application link does _not_ have the prefix {{/testwebuiproxy/..}} \\ \\ All site-relative links (URL starting with {{"/"}}) should have been prepended with the uiRoot prefix {color:red}{{/testwebuiproxy/..}}{color} like this ... {code:xml|borderColor=#0c0} ... App IDApp Name... local-1445896187531Spark shell ... {code} > HistoryPage not multi-tenancy enabled (app links not prefixed with > APPLICATION_WEB_PROXY_BASE) > -- > > Key: SPARK-11338 > URL: https://issues.apache.org/jira/browse/SPARK-11338 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Christian Kadner > Original Estimate: 48h > Remaining Estimate: 48h > > Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export > APPLICATION_WEB_PROXY_BASE=}}). This makes it > impossible/unpractical to expose the *History Server* in a multi-tenancy > environment where each Spark service instance has one history server behind a > multi-tenant enabled proxy server. All other Spark web UI pages are > correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. > *Repro steps:*\\ > # Configure history log collection: > {code:title=conf/spark-defaults.conf|borderStyle=solid} > spark.eventLog.enabled true > spark.eventLog.dir logs/history > spark.history.fs.logDirectory logs/history > {code} >
[jira] [Created] (SPARK-9211) HiveComparisonTest generates incorrect file name for golden answer files on Windows
Christian Kadner created SPARK-9211: --- Summary: HiveComparisonTest generates incorrect file name for golden answer files on Windows Key: SPARK-9211 URL: https://issues.apache.org/jira/browse/SPARK-9211 Project: Spark Issue Type: Test Components: SQL, Windows Affects Versions: 1.4.1 Environment: Windows Reporter: Christian Kadner Priority: Minor The names of the golden answer files for the Hive test cases (test suites based on {{HiveComparisonTest}}) are generated using an MD5 hash of the query text. When the query text contains line breaks then the generated MD5 hash differs between Windows and Linux/OSX ({{\r\n}} vs {{\n}}). This results in erroneously created golden answer files from just running a Hive comparison test and makes it impossible to modify or add new test cases with correctly named golden answer files on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support
[ https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-7265: Labels: spark.tc (was: ) Improving documentation for Spark SQL Hive support --- Key: SPARK-7265 URL: https://issues.apache.org/jira/browse/SPARK-7265 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.3.1 Reporter: Jihong MA Assignee: Jihong MA Priority: Trivial Labels: spark.tc Fix For: 1.5.0 miscellaneous documentation improvement for Spark SQL Hive support, Yarn cluster deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs
[ https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-2859: Labels: spark.tc (was: ) Update url of Kryo project in related docs -- Key: SPARK-2859 URL: https://issues.apache.org/jira/browse/SPARK-2859 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Guancheng Chen Assignee: Guancheng Chen Priority: Trivial Labels: spark.tc Fix For: 1.0.3, 1.1.0 Kryo project has been migrated from googlecode to github, hence we need to update its URL in related docs such as tuning.md. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md
[ https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-8639: Labels: spark.tc (was: ) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md - Key: SPARK-8639 URL: https://issues.apache.org/jira/browse/SPARK-8639 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Rosstin Murphy Assignee: Rosstin Murphy Priority: Trivial Labels: spark.tc Fix For: 1.4.1, 1.5.0 In docs/README.md, the text states around line 31 Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll will create a directory called '_site' containing index.html as well as the rest of the compiled files. It might be more clear if we said Execute 'jekyll build' from the 'docs/' directory to compile the site. Compiling the site with Jekyll will create a directory called '_site' containing index.html as well as the rest of the compiled files. In docs/api.md: Here you can API docs for Spark and its submodules. should be something like: Here you can read API docs for Spark and its submodules. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5562) LDA should handle empty documents
[ https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-5562: Labels: spark.tc (was: starter) LDA should handle empty documents - Key: SPARK-5562 URL: https://issues.apache.org/jira/browse/SPARK-5562 Project: Spark Issue Type: Test Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Alok Singh Priority: Minor Labels: spark.tc, starter Fix For: 1.5.0 Original Estimate: 96h Remaining Estimate: 96h Latent Dirichlet Allocation (LDA) could easily be given empty documents when people select a small vocabulary. We should check to make sure it is robust to empty documents. This will hopefully take the form of a unit test, but may require modifying the LDA implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7357) Improving HBaseTest example
[ https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-7357: Labels: spark.tc (was: ) Improving HBaseTest example --- Key: SPARK-7357 URL: https://issues.apache.org/jira/browse/SPARK-7357 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 1.3.1 Reporter: Jihong MA Assignee: Jihong MA Priority: Minor Labels: spark.tc Fix For: 1.5.0 Original Estimate: 2m Remaining Estimate: 2m Minor improvement to HBaseTest example, when Hbase related configurations e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are not set to default (localhost:2181), connection to zookeeper might hang as shown in following stack 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=xxx.xxx.xxx:2181 sessionTimeout=9 watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown error) 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to xxx.xxx.xxx/9.30.94.121:2181, initiating session 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, negotiated timeout = 4 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null this is due to hbase-site.xml is not placed on spark class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly
[ https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-6785: Labels: spark.tc (was: ) DateUtils can not handle date before 1970/01/01 correctly - Key: SPARK-6785 URL: https://issues.apache.org/jira/browse/SPARK-6785 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu Assignee: Christian Kadner Labels: spark.tc Fix For: 1.5.0 {code} scala val d = new Date(100) d: java.sql.Date = 1969-12-31 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d)) res1: java.sql.Date = 1970-01-01 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)
[ https://issues.apache.org/jira/browse/SPARK-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612584#comment-14612584 ] Christian Kadner commented on SPARK-8746: - Thank you Sean! Need to update download link for Hive 0.13.1 jars (HiveComparisonTest) -- Key: SPARK-8746 URL: https://issues.apache.org/jira/browse/SPARK-8746 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Christian Kadner Assignee: Christian Kadner Priority: Trivial Labels: documentation, test Fix For: 1.4.1, 1.5.0 Original Estimate: 1h Remaining Estimate: 1h The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) describes how to generate golden answer files for new hive comparison test cases. However the download link for the Hive 0.13.1 jars points to https://hive.apache.org/downloads.html but none of the linked mirror sites still has the 0.13.1 version. We need to update the link to https://archive.apache.org/dist/hive/hive-0.13.1/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)
Christian Kadner created SPARK-8746: --- Summary: Need to update download link for Hive 0.13.1 jars (HiveComparisonTest) Key: SPARK-8746 URL: https://issues.apache.org/jira/browse/SPARK-8746 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Christian Kadner Priority: Trivial The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) describes how to generate golden answer files for new hive comparison test cases. However the download link for the Hive 0.13.1 jars points to https://hive.apache.org/downloads.html but none of the linked mirror sites still has the 0.13.1 version. We need to update the link to https://archive.apache.org/dist/hive/hive-0.13.1/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly
[ https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549366#comment-14549366 ] Christian Kadner edited comment on SPARK-6785 at 6/30/15 9:50 PM: -- {panel:borderStyle=dashed|borderColor=#ccc|bgColor=#CE} Pull Request +[6242|https://github.com/apache/spark/pull/6242]+ {panel} \\ Before my fix, the from-and-to Java date conversion of dates before 1970 will only work for {{java.sql.Date}} objects that reflect a date and time exactly at midnight in the System's local time zone. Otherwise, if the Date's time is just one millisecond before or after midnight, the result of the above conversion will be offset by one day for Dates before 1970 because of a rounding (truncation) flaw in the function {{DateUtils.millisToDays(Long):Int}} \\ {code} scala val df = new SimpleDateFormat(-MM-dd HH:mm:ss) df: java.text.SimpleDateFormat = -MM-dd HH:mm:ss scala val d1 = new Date(df.parse(1969-01-01 00:00:00).getTime) d2: java.sql.Date = 1969-01-01 scala val d2 = new Date(df.parse(1969-01-01 00:00:01).getTime) d2: java.sql.Date = 1969-01-01 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d1)) res1: java.sql.Date = 1969-01-01 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d2)) res2: java.sql.Date = 1969-01-02 {code} \\ What is the code doing and how to fix it: \\ - A {{java.util.Date}} is represented by milliseconds ({{Long}}) since the Epoch (1970/01/01 0:00:00 GMT) with positive numbers for dates after and negative numbers for dates before 1970 - The function {{DateUtils.fromJavaDate(java.util.Date):Int}} calculates the number of full days passed since 1970/01/01 00:00:00 (local time, not UTC), but by using the data type {{Long}} (as opposed to {{Double}}) when converting milliseconds to days it essentially truncates the fractional part of days passed (disregarding the impact of hours, minutes, seconds) - The function {{DateUtils.toJavaDate(Int):Date}} converts the given number of days into milliseconds and adds it 1970/01/01 00:00:00 (local time, not UTC) - _Side note: The time-zone offset from UTC is factored in when converting a Date to days and removed when converting days to Date, so the time-zone shifting is neutralized in the round-trip conversion {{toJavaDate(fromJavaDate(java.util.Date))}}._ - The truncation of partial days is not a problem for dates after 1970 since adding a fraction of a day to any date will not flip the calendar to the next day (since all our Dates start 0:00:00 AM) - That truncation of partial days however is a problem when subtracting even a second from a {{Date}} with time at 0:00:00 AM which should turn the calender back one day to the previous date - Ideally the date conversion should be done using milliseconds, but since using days has been established already, the fix is to work with {{Double}} to preserve fractions of days and use {{floor()}} instead of the implicit truncate to round to a full number of days ({{Int}}) \\ Pseudo-code example, adding or subtracting 1 hour to Date 1970/01/01 0:00:00 using milliseconds... {code} 1970-01-01 0:00:00 + 1 hr = 1970-01-01 1:00:00 1970-01-01 0:00:00 - 1 hr = 1969-12-31 23:00:00 {code} \\ Same example, using full days. One hour is about 0.04 days. Using {{trunc()}} versus {{floor()}} we get ... {code} trunc(+0.04) = +0 -- 1970-01-01 + 0 days = 1970-01-01(correct) floor(+0.04) = +0 -- 1970-01-01 + 0 days = 1970-01-01(correct) trunc(-0.04) = -0 -- 1970-01-01 + -0 days = 1970-01-01 (incorrect, bug) floor(-0.04) = -1 -- 1970-01-01 + -1 day = 1969-12-31 (correct, fix) {code} {code} def trunc(d: Dounble): Int = d.toInt {code} was (Author: ckadner): {panel:borderStyle=dashed|borderColor=#ccc|bgColor=#CE} Please review only my second Pull Request +[6242|https://github.com/apache/spark/pull/6242]+ and ignore my first Pull Request -[6236|https://github.com/apache/spark/pull/6236]- Thank you! {panel} \\ Before my fix, the from-and-to Java date conversion of dates before 1970 will only work for {{java.sql.Date}} objects that reflect a date and time exactly at midnight in the System's local time zone. Otherwise, if the Date's time is just one millisecond before or after midnight, the result of the above conversion will be offset by one day for Dates before 1970 because of a rounding (truncation) flaw in the function {{DateUtils.millisToDays(Long):Int}} \\ {code} scala val df = new SimpleDateFormat(-MM-dd HH:mm:ss) df: java.text.SimpleDateFormat = -MM-dd HH:mm:ss scala val d1 = new Date(df.parse(1969-01-01 00:00:00).getTime) d2: java.sql.Date = 1969-01-01 scala val d2 = new Date(df.parse(1969-01-01 00:00:01).getTime) d2: java.sql.Date = 1969-01-01 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d1)) res1:
[jira] [Comment Edited] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly
[ https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549366#comment-14549366 ] Christian Kadner edited comment on SPARK-6785 at 5/18/15 11:21 PM: --- {panel:borderStyle=dashed|borderColor=#ccc|bgColor=#CE} Please review only my second Pull Request +[6242|https://github.com/apache/spark/pull/6242]+ and ignore my first Pull Request -[6236|https://github.com/apache/spark/pull/6236]- Thank you! {panel} \\ Before my fix, the from-and-to Java date conversion of dates before 1970 will only work for {{java.sql.Date}} objects that reflect a date and time exactly at midnight in the System's local time zone. Otherwise, if the Date's time is just one millisecond before or after midnight, the result of the above conversion will be offset by one day for Dates before 1970 because of a rounding (truncation) flaw in the function {{DateUtils.millisToDays(Long):Int}} \\ {code} scala val df = new SimpleDateFormat(-MM-dd HH:mm:ss) df: java.text.SimpleDateFormat = -MM-dd HH:mm:ss scala val d1 = new Date(df.parse(1969-01-01 00:00:00).getTime) d2: java.sql.Date = 1969-01-01 scala val d2 = new Date(df.parse(1969-01-01 00:00:01).getTime) d2: java.sql.Date = 1969-01-01 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d1)) res1: java.sql.Date = 1969-01-01 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d2)) res2: java.sql.Date = 1969-01-02 {code} \\ What is the code doing and how to fix it: \\ - A {{java.util.Date}} is represented by milliseconds ({{Long}}) since the Epoch (1970/01/01 0:00:00 GMT) with positive numbers for dates after and negative numbers for dates before 1970 - The function {{DateUtils.fromJavaDate(java.util.Date):Int}} calculates the number of full days passed since 1970/01/01 00:00:00 (local time, not UTC), but by using the data type {{Long}} (as opposed to {{Double}}) when converting milliseconds to days it essentially truncates the fractional part of days passed (disregarding the impact of hours, minutes, seconds) - The function {{DateUtils.toJavaDate(Int):Date}} converts the given number of days into milliseconds and adds it 1970/01/01 00:00:00 (local time, not UTC) - _Side note: The time-zone offset from UTC is factored in when converting a Date to days and removed when converting days to Date, so the time-zone shifting is neutralized in the round-trip conversion {{toJavaDate(fromJavaDate(java.util.Date))}}._ - The truncation of partial days is not a problem for dates after 1970 since adding a fraction of a day to any date will not flip the calendar to the next day (since all our Dates start 0:00:00 AM) - That truncation of partial days however is a problem when subtracting even a second from a {{Date}} with time at 0:00:00 AM which should turn the calender back one day to the previous date - Ideally the date conversion should be done using milliseconds, but since using days has been established already, the fix is to work with {{Double}} to preserve fractions of days and use {{floor()}} instead of the implicit truncate to round to a full number of days ({{Int}}) \\ Pseudo-code example, adding or subtracting 1 hour to Date 1970/01/01 0:00:00 using milliseconds... {code} 1970-01-01 0:00:00 + 1 hr = 1970-01-01 1:00:00 1970-01-01 0:00:00 - 1 hr = 1969-12-31 23:00:00 {code} \\ Same example, using full days. One hour is about 0.04 days. Using {{trunc()}} versus {{floor()}} we get ... {code} trunc(+0.04) = +0 -- 1970-01-01 + 0 days = 1970-01-01(correct) floor(+0.04) = +0 -- 1970-01-01 + 0 days = 1970-01-01(correct) trunc(-0.04) = -0 -- 1970-01-01 + -0 days = 1970-01-01 (incorrect, bug) floor(-0.04) = -1 -- 1970-01-01 + -1 day = 1969-12-31 (correct, fix) {code} {code} def trunc(d: Dounble): Int = d.toInt {code} DateUtils can not handle date before 1970/01/01 correctly - Key: SPARK-6785 URL: https://issues.apache.org/jira/browse/SPARK-6785 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu {code} scala val d = new Date(100) d: java.sql.Date = 1969-12-31 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d)) res1: java.sql.Date = 1970-01-01 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly
[ https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542697#comment-14542697 ] Christian Kadner commented on SPARK-6785: - Hi Patrick, I would like to work on this issue. Seems like the date conversion is thrown off by the time-zone adjustments and the fact that the interchange type is days instead of millis. I am preparing a pull-request which will also include test cases to cover more date conversion scenarios. DateUtils can not handle date before 1970/01/01 correctly - Key: SPARK-6785 URL: https://issues.apache.org/jira/browse/SPARK-6785 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu {code} scala val d = new Date(100) d: java.sql.Date = 1969-12-31 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d)) res1: java.sql.Date = 1970-01-01 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540217#comment-14540217 ] Christian Kadner edited comment on SPARK-4128 at 5/12/15 5:04 PM: -- Hi Sean, while there is still a section covering the IntelliJ setup, what is missing are these steps (or an updated version of it) which have to be taken in order to get a successfully Make of the project. I needed to do some version of it for 1.3.0, 1.3.1, 1.4.0. part of Patrick's deleted paragraph - start ... At the top of the leftmost pane, make sure the Project/Packages selector is set to Packages. Right click on any package and click “Open Module Settings” - you will be able to modify any of the modules here. A few of the modules need to be modified slightly from the default import. Add sources to the following modules: Under “Sources” tab add a source on the right. spark-hive: add v0.13.1/src/main/scala spark-hive-thriftserver v0.13.1/src/main/scala spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala For spark-yarn click “Add content root” and navigate in the filesystem to yarn/common directory of Spark ... part of Patrick's deleted paragraph - end I suggest to add an updated version of that to the wiki, since some of the Modules are setup in a way that similar non-obvious manual steps are required to make them compile. was (Author: ckadner): Hi Sean, while there is still a section covering the IntelliJ setup, what is missing are these steps, or an updated version of it, which I had to do for 1.3.0, 1.3.1, 1.4.0 in order to get a successfully Make of the project. part of Patrick's deleted paragraph - start ... At the top of the leftmost pane, make sure the Project/Packages selector is set to Packages. Right click on any package and click “Open Module Settings” - you will be able to modify any of the modules here. A few of the modules need to be modified slightly from the default import. Add sources to the following modules: Under “Sources” tab add a source on the right. spark-hive: add v0.13.1/src/main/scala spark-hive-thriftserver v0.13.1/src/main/scala spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala For spark-yarn click “Add content root” and navigate in the filesystem to yarn/common directory of Spark ... part of Patrick's deleted paragraph - end I suggest to add an updated version of that to the wiki, since some of the Modules are setup in a way that similar non-obvious manual steps are required to make them compile. Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540217#comment-14540217 ] Christian Kadner commented on SPARK-4128: - Hi Sean, while there is still a section covering the IntelliJ setup, what is missing are these steps, or an updated version of it, which I had to do for 1.3.0, 1.3.1, 1.4.0 in order to get a successfully Make of the project. part of Patrick's deleted paragraph - start ... At the top of the leftmost pane, make sure the Project/Packages selector is set to Packages. Right click on any package and click “Open Module Settings” - you will be able to modify any of the modules here. A few of the modules need to be modified slightly from the default import. Add sources to the following modules: Under “Sources” tab add a source on the right. spark-hive: add v0.13.1/src/main/scala spark-hive-thriftserver v0.13.1/src/main/scala spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala For spark-yarn click “Add content root” and navigate in the filesystem to yarn/common directory of Spark part of Patrick's deleted paragraph - end I suggest to add an updated version of that to the wiki, since some of the Modules are setup in a way that similar non-obvious manual steps are required to make them compile. Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540217#comment-14540217 ] Christian Kadner edited comment on SPARK-4128 at 5/12/15 5:02 PM: -- Hi Sean, while there is still a section covering the IntelliJ setup, what is missing are these steps, or an updated version of it, which I had to do for 1.3.0, 1.3.1, 1.4.0 in order to get a successfully Make of the project. part of Patrick's deleted paragraph - start ... At the top of the leftmost pane, make sure the Project/Packages selector is set to Packages. Right click on any package and click “Open Module Settings” - you will be able to modify any of the modules here. A few of the modules need to be modified slightly from the default import. Add sources to the following modules: Under “Sources” tab add a source on the right. spark-hive: add v0.13.1/src/main/scala spark-hive-thriftserver v0.13.1/src/main/scala spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala For spark-yarn click “Add content root” and navigate in the filesystem to yarn/common directory of Spark ... part of Patrick's deleted paragraph - end I suggest to add an updated version of that to the wiki, since some of the Modules are setup in a way that similar non-obvious manual steps are required to make them compile. was (Author: ckadner): Hi Sean, while there is still a section covering the IntelliJ setup, what is missing are these steps, or an updated version of it, which I had to do for 1.3.0, 1.3.1, 1.4.0 in order to get a successfully Make of the project. part of Patrick's deleted paragraph - start ... At the top of the leftmost pane, make sure the Project/Packages selector is set to Packages. Right click on any package and click “Open Module Settings” - you will be able to modify any of the modules here. A few of the modules need to be modified slightly from the default import. Add sources to the following modules: Under “Sources” tab add a source on the right. spark-hive: add v0.13.1/src/main/scala spark-hive-thriftserver v0.13.1/src/main/scala spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala For spark-yarn click “Add content root” and navigate in the filesystem to yarn/common directory of Spark part of Patrick's deleted paragraph - end I suggest to add an updated version of that to the wiki, since some of the Modules are setup in a way that similar non-obvious manual steps are required to make them compile. Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540502#comment-14540502 ] Christian Kadner edited comment on SPARK-4128 at 5/12/15 7:11 PM: -- Yes, I encountered these compile problems after a fresh import of the Spark 1.3.0 and 1.3.1 project from download (.tgz) and 1.4 when loaded from a Git repository. For Scala 2.10/2.11 support, I suppose either one should be chosen by default without having to run a script. Btw, that should be doc'd as well ;-) was (Author: ckadner): Yes, I encountered these compile problems after a fresh import of the Spark 1.4 project both when downloaded (tar/zip) and when loaded from a Git repository. For Scala 2.10/2.11 support, I suppose either one should be chosen by default without having to run a script. Btw, that should be doc'd as well ;-) Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540397#comment-14540397 ] Christian Kadner commented on SPARK-4128: - Not every user may care about each of the modules, and yes, these instructions may need to be revised. Yet I strongly think there should be some general text, maybe under Other Tips, that explains the need to manually update the Module settings to mark additional folders as Source folders (after selecting the right combination of Profiles and doing a Generate Sources For spark-hive this seems to still be true. Patrick had written this comment in one of his emails, which are helpful to understand why that needs to be done. In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. Besides fixing the module settings for spark-hive, I had to change the flume-sink module settings to mark target\scala-2.10\src_managed\main\compiled_avro folder as additional Source Folder. Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540397#comment-14540397 ] Christian Kadner edited comment on SPARK-4128 at 5/12/15 6:23 PM: -- Not every user may care about each of the modules, and yes, these instructions may need to be revised. Yet I strongly think there should be some general text, maybe under Other Tips, that explains the need to manually update the Module settings to mark additional folders as Source folders (after selecting the right combination of Profiles and doing a Generate Sources For spark-hive this seems to still be true. Patrick had written this comment in one of his emails, which is helpful to understand why that needs to be done. In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. Besides fixing the module settings for spark-hive, I had to change the flume-sink module settings to mark target\scala-2.10\src_managed\main\compiled_avro folder as additional Source Folder. was (Author: ckadner): Not every user may care about each of the modules, and yes, these instructions may need to be revised. Yet I strongly think there should be some general text, maybe under Other Tips, that explains the need to manually update the Module settings to mark additional folders as Source folders (after selecting the right combination of Profiles and doing a Generate Sources For spark-hive this seems to still be true. Patrick had written this comment in one of his emails, which are helpful to understand why that needs to be done. In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11 and 2.10. In these cases, you may need to add source locations explicitly to intellij if you want the entire project to compile there. Unfortunately as long as we support cross-building like this, it will be an issue. Intellij's maven support does not correctly detect our use of the maven-build-plugin to add source directories. Besides fixing the module settings for spark-hive, I had to change the flume-sink module settings to mark target\scala-2.10\src_managed\main\compiled_avro folder as additional Source Folder. Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540502#comment-14540502 ] Christian Kadner commented on SPARK-4128: - Yes, I encountered these compile problems after a fresh import of the Spark 1.4 project both when downloaded (tar/zip) and when loaded from a Git repository. For Scala 2.10/2.11 support, I suppose either one should be chosen by default without having to run a script. Btw, that should be doc'd as well ;-) Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540727#comment-14540727 ] Christian Kadner commented on SPARK-4128: - Hi Sean, based on what Patrick described, I would propose this text under IDE Setup IntelliJ Other Tips !-- start -- Some of the modules have pluggable source directories based on Maven profiles (i.e. to support both Scala 2.11 and 2.10 or to allow cross building against different versions of Hive). In some cases IntelliJ's does not correctly detect our use of the maven-build-plugin to add source directories. In these cases, you may need to add source locations explicitly to compile the entire project. - open the Project Settings and select Modules - based on your selected Maven profiles, you may need to add source folders to the following modules: spark-hive: add v0.13.1/src/main/scala spark-streaming-flume-sink: add target\scala-2.10\src_managed\main\compiled_avro !-- end -- In addition we could quote the compilation errors, so other developers will find this solution when they use search the web to trouble shoot these issues. Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540776#comment-14540776 ] Christian Kadner commented on SPARK-4128: - Thank you Sean! Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539122#comment-14539122 ] Christian Kadner commented on SPARK-4128: - Hi Patrick, I recently set up my IntelliJ IDEA development environment for Apache Spark and I struggled with a few of the same/similar compilation errors that were described in this email thread https://www.mail-archive.com/dev@spark.apache.org/msg06070.html. You had added a helpful paragraph to the wiki on Nov 20, 2014 but you removed it again on Jan 9, 2015. Did you find a better solution or a more pertinent place to put this information? Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org