[jira] [Closed] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite

2017-01-15 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner closed SPARK-10890.

   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.1

This is issue is no longer reproducible. It still happened in v2.1.0-rc5

> "Column count does not match; SQL statement:" error in JDBCWriteSuite
> -
>
> Key: SPARK-10890
> URL: https://issues.apache.org/jira/browse/SPARK-10890
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Rick Hillegas
> Fix For: 2.1.1, 2.2.0
>
>
> I get the following error when I run the following test...
> mvn -Dhadoop.version=2.4.0 
> -DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test
> {noformat}
> JDBCWriteSuite:
> 13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name 
> DAGScheduler for source because spark.app.id is not set.
> - Basic CREATE
> - CREATE with overwrite
> - CREATE then INSERT to append
> - CREATE then INSERT to truncate
> 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 
> in stage 23.0 (TID 31)
> org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
> INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
>   at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
>   at org.h2.message.DbException.get(DbException.java:179)
>   at org.h2.message.DbException.get(DbException.java:155)
>   at org.h2.message.DbException.get(DbException.java:144)
>   at org.h2.command.dml.Insert.prepare(Insert.java:265)
>   at org.h2.command.Parser.prepareCommand(Parser.java:247)
>   at org.h2.engine.Session.prepareLocal(Session.java:446)
>   at org.h2.engine.Session.prepareCommand(Session.java:388)
>   at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
>   at 
> org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
>   at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 
> in stage 23.0 (TID 32)
> org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
> INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
>   at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
>   at org.h2.message.DbException.get(DbException.java:179)
>   at org.h2.message.DbException.get(DbException.java:155)
>   at org.h2.message.DbException.get(DbException.java:144)
>   at org.h2.command.dml.Insert.prepare(Insert.java:265)
>   at org.h2.command.Parser.prepareCommand(Parser.java:247)
>   at org.h2.engine.Session.prepareLocal(Session.java:446)
>   at org.h2.engine.Session.prepareCommand(Session.java:388)
>   at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
>   at 
> org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
>   at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100)
>   at 
> 

[jira] [Updated] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite

2017-01-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-10890:
-
Description: 
I get the following error when I run the following test...

mvn -Dhadoop.version=2.4.0 
-DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test

{noformat}
JDBCWriteSuite:
13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name 
DAGScheduler for source because spark.app.id is not set.
- Basic CREATE
- CREATE with overwrite
- CREATE then INSERT to append
- CREATE then INSERT to truncate
13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 23.0 (TID 31)
org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.message.DbException.get(DbException.java:144)
at org.h2.command.dml.Insert.prepare(Insert.java:265)
at org.h2.command.Parser.prepareCommand(Parser.java:247)
at org.h2.engine.Session.prepareLocal(Session.java:446)
at org.h2.engine.Session.prepareCommand(Session.java:388)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
at 
org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in 
stage 23.0 (TID 32)
org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.message.DbException.get(DbException.java:144)
at org.h2.command.dml.Insert.prepare(Insert.java:265)
at org.h2.command.Parser.prepareCommand(Parser.java:247)
at org.h2.engine.Session.prepareLocal(Session.java:446)
at org.h2.engine.Session.prepareCommand(Session.java:388)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
at 
org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
at 

[jira] [Updated] (SPARK-17803) Docker integration tests don't run with "Docker for Mac"

2016-10-06 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-17803:
-
Description: 
The {{docker-integration-tests}} can be run on Mac OS X with _[Docker 
Toolbox|https://docs.docker.com/toolbox/overview/]_. However _Docker Toolbox_ 
comes with a cumbersome setup (VirtualBox VM running a _boot2docker_ Linux 
distro, tricky to do port mapping, ...). 
A much preferable way to work with Docker on Mac OS X it is to use the _[Docker 
for Mac|https://docs.docker.com/docker-for-mac/docker-toolbox/]_ native 
application but Spark's {{docker-integration-tests}} don't run with _Docker for 
Mac_. The 
[work-around|https://github.com/spotify/docker-client#a-note-on-using-docker-for-mac]
 to set {{DOCKER_HOST=unix:///var/run/docker.sock}} is not the most straight 
forward find.

We should upgrade the [spotify 
docker-client|https://mvnrepository.com/artifact/com.spotify/docker-client] 
dependency from 3.6.6 to 4.0.8 or later which supports "Docker for Mac" out of 
the box.



  was:
The {{docker-integration-tests}} can be run on Mac OS X with _[Docker 
Toolbox|https://docs.docker.com/toolbox/overview/]_. However _Docker Toolbox_ 
comes with a cumbersome setup (VirtualBox VM running boot2docker Linux distro, 
tricky to do port mapping, ...). 
A much preferable way to work with Docker on Mac OS X it is to use the _[Docker 
for Mac|https://docs.docker.com/docker-for-mac/docker-toolbox/]_ native 
application but Spark's {{docker-integration-tests}} don't run with _Docker for 
Mac_. The 
[work-around|https://github.com/spotify/docker-client#a-note-on-using-docker-for-mac]
 to set {{DOCKER_HOST=unix:///var/run/docker.sock}} is not the most straight 
forward find.

We should upgrade the [spotify 
docker-client|https://mvnrepository.com/artifact/com.spotify/docker-client] 
dependency from 3.6.6 to 4.0.8 or later which supports "Docker for Mac" out of 
the box.




> Docker integration tests don't run with "Docker for Mac"
> 
>
> Key: SPARK-17803
> URL: https://issues.apache.org/jira/browse/SPARK-17803
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Tests
>Affects Versions: 2.0.1
> Environment: Mac OS X 10.10.5, Docker for Mac
>Reporter: Christian Kadner
>Priority: Minor
>
> The {{docker-integration-tests}} can be run on Mac OS X with _[Docker 
> Toolbox|https://docs.docker.com/toolbox/overview/]_. However _Docker Toolbox_ 
> comes with a cumbersome setup (VirtualBox VM running a _boot2docker_ Linux 
> distro, tricky to do port mapping, ...). 
> A much preferable way to work with Docker on Mac OS X it is to use the 
> _[Docker for Mac|https://docs.docker.com/docker-for-mac/docker-toolbox/]_ 
> native application but Spark's {{docker-integration-tests}} don't run with 
> _Docker for Mac_. The 
> [work-around|https://github.com/spotify/docker-client#a-note-on-using-docker-for-mac]
>  to set {{DOCKER_HOST=unix:///var/run/docker.sock}} is not the most straight 
> forward find.
> We should upgrade the [spotify 
> docker-client|https://mvnrepository.com/artifact/com.spotify/docker-client] 
> dependency from 3.6.6 to 4.0.8 or later which supports "Docker for Mac" out 
> of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17803) Docker integration tests don't run with "Docker for Mac"

2016-10-06 Thread Christian Kadner (JIRA)
Christian Kadner created SPARK-17803:


 Summary: Docker integration tests don't run with "Docker for Mac"
 Key: SPARK-17803
 URL: https://issues.apache.org/jira/browse/SPARK-17803
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Tests
Affects Versions: 2.0.1
 Environment: Mac OS X 10.10.5, Docker for Mac
Reporter: Christian Kadner
Priority: Minor


The {{docker-integration-tests}} can be run on Mac OS X with _[Docker 
Toolbox|https://docs.docker.com/toolbox/overview/]_. However _Docker Toolbox_ 
comes with a cumbersome setup (VirtualBox VM running boot2docker Linux distro, 
tricky to do port mapping, ...). 
A much preferable way to work with Docker on Mac OS X it is to use the _[Docker 
for Mac|https://docs.docker.com/docker-for-mac/docker-toolbox/]_ native 
application but Spark's {{docker-integration-tests}} don't run with _Docker for 
Mac_. The 
[work-around|https://github.com/spotify/docker-client#a-note-on-using-docker-for-mac]
 to set {{DOCKER_HOST=unix:///var/run/docker.sock}} is not the most straight 
forward find.

We should upgrade the [spotify 
docker-client|https://mvnrepository.com/artifact/com.spotify/docker-client] 
dependency from 3.6.6 to 4.0.8 or later which supports "Docker for Mac" out of 
the box.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite

2015-10-28 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979644#comment-14979644
 ] 

Christian Kadner commented on SPARK-10890:
--

Hi [~rhillegas]
since the `org.h2.jdbc.JdbcSQLException` is logged by 
`org.apache.spark.executor.Executor` and again by 
`org.apache.spark.scheduler.TaskSetManager`, we could try to influence their 
log behavior / log levels. 
The easiest way to suppress the logging of the expected exception is to 
temporarily increase the log threshold to `FATAL` while the test case is 
executed and restore the original log level right after. 
I realize this is a _"hack"_ with the side effect of suppressing any other log 
message of less than FATAL severity during the test case execution. Discussion 
invited :-)

> "Column count does not match; SQL statement:" error in JDBCWriteSuite
> -
>
> Key: SPARK-10890
> URL: https://issues.apache.org/jira/browse/SPARK-10890
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Rick Hillegas
>
> I get the following error when I run the following test...
> mvn -Dhadoop.version=2.4.0 
> -DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test
> {noformat}
> JDBCWriteSuite:
> 13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name 
> DAGScheduler for source because spark.app.id is not set.
> - Basic CREATE
> - CREATE with overwrite
> - CREATE then INSERT to append
> - CREATE then INSERT to truncate
> 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 
> in stage 23.0 (TID 31)
> org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
> INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
>   at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
>   at org.h2.message.DbException.get(DbException.java:179)
>   at org.h2.message.DbException.get(DbException.java:155)
>   at org.h2.message.DbException.get(DbException.java:144)
>   at org.h2.command.dml.Insert.prepare(Insert.java:265)
>   at org.h2.command.Parser.prepareCommand(Parser.java:247)
>   at org.h2.engine.Session.prepareLocal(Session.java:446)
>   at org.h2.engine.Session.prepareCommand(Session.java:388)
>   at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
>   at 
> org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
>   at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 
> in stage 23.0 (TID 32)
> org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
> INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
>   at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
>   at org.h2.message.DbException.get(DbException.java:179)
>   at org.h2.message.DbException.get(DbException.java:155)
>   at org.h2.message.DbException.get(DbException.java:144)
>   at org.h2.command.dml.Insert.prepare(Insert.java:265)
>   at org.h2.command.Parser.prepareCommand(Parser.java:247)
>   at org.h2.engine.Session.prepareLocal(Session.java:446)
>   at org.h2.engine.Session.prepareCommand(Session.java:388)
>   at 

[jira] [Comment Edited] (SPARK-10890) "Column count does not match; SQL statement:" error in JDBCWriteSuite

2015-10-28 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979644#comment-14979644
 ] 

Christian Kadner edited comment on SPARK-10890 at 10/29/15 1:52 AM:


Hi [~rhillegas]
since the {{org.h2.jdbc.JdbcSQLException}} is logged by 
{{org.apache.spark.executor.Executor}} and again by 
{{org.apache.spark.scheduler.TaskSetManager}}, we could try to influence their 
log behavior / log levels. 
The easiest way to suppress the logging of the expected exception is to 
temporarily increase the log threshold to {{FATAL}} while the test case is 
executed and restore the original log level right after. 
I realize this is a _"hack"_ with the side effect of suppressing any other log 
message of less than FATAL severity during the test case execution. Discussion 
invited :-)


was (Author: ckadner):
Hi [~rhillegas]
since the `org.h2.jdbc.JdbcSQLException` is logged by 
`org.apache.spark.executor.Executor` and again by 
`org.apache.spark.scheduler.TaskSetManager`, we could try to influence their 
log behavior / log levels. 
The easiest way to suppress the logging of the expected exception is to 
temporarily increase the log threshold to `FATAL` while the test case is 
executed and restore the original log level right after. 
I realize this is a _"hack"_ with the side effect of suppressing any other log 
message of less than FATAL severity during the test case execution. Discussion 
invited :-)

> "Column count does not match; SQL statement:" error in JDBCWriteSuite
> -
>
> Key: SPARK-10890
> URL: https://issues.apache.org/jira/browse/SPARK-10890
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Rick Hillegas
>
> I get the following error when I run the following test...
> mvn -Dhadoop.version=2.4.0 
> -DwildcardSuites=org.apache.spark.sql.jdbc.JDBCWriteSuite test
> {noformat}
> JDBCWriteSuite:
> 13:22:15.603 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 13:22:16.506 WARN org.apache.spark.metrics.MetricsSystem: Using default name 
> DAGScheduler for source because spark.app.id is not set.
> - Basic CREATE
> - CREATE with overwrite
> - CREATE then INSERT to append
> - CREATE then INSERT to truncate
> 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 
> in stage 23.0 (TID 31)
> org.h2.jdbc.JdbcSQLException: Column count does not match; SQL statement:
> INSERT INTO TEST.INCOMPATIBLETEST VALUES (?, ?, ?) [21002-183]
>   at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
>   at org.h2.message.DbException.get(DbException.java:179)
>   at org.h2.message.DbException.get(DbException.java:155)
>   at org.h2.message.DbException.get(DbException.java:144)
>   at org.h2.command.dml.Insert.prepare(Insert.java:265)
>   at org.h2.command.Parser.prepareCommand(Parser.java:247)
>   at org.h2.engine.Session.prepareLocal(Session.java:446)
>   at org.h2.engine.Session.prepareCommand(Session.java:388)
>   at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
>   at 
> org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
>   at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.insertStatement(JdbcUtils.scala:72)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:100)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:229)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:228)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$32.apply(RDD.scala:892)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1856)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 13:22:19.312 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 
> in stage 23.0 (TID 32)
> 

[jira] [Updated] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)

2015-10-27 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-11338:
-
Description: 
Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
APPLICATION_WEB_PROXY_BASE=}}). This makes it 
impossible/unpractical to expose the *History Server* in a multi-tenancy 
environment where each Spark service instance has one history server behind a 
multi-tenant enabled proxy server.  All other Spark web UI pages are correctly 
prefixed when the {{APPLICATION_WEB_PROXY_BASE}} environment variable is set.

*Repro steps:*\\
# Configure history log collection:
{code:title=conf/spark-defaults.conf|borderStyle=solid}
spark.eventLog.enabled true
spark.eventLog.dir logs/history
spark.history.fs.logDirectory  logs/history
{code}
...create the logs folders:
{code}
$ mkdir -p logs/history
{code}
# Start the Spark shell and run the word count example:
{code:java|borderStyle=solid}
$ bin/spark-shell
...
scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
1)).reduceByKey(_ + _).collect
scala> sc.stop
{code}
# Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}):
{code}
$ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
{code}
# Start the history server:
{code}
$  sbin/start-history-server.sh
{code}
# Bring up the History Server web UI at {{localhost:18080}} and view the 
application link in the HTML source text:
{code:xml|borderColor=#c00}
...
App IDApp 
Name...
  

  local-1445896187531Spark 
shell
  ...
{code}
*Notice*, application link "{{/history/local-1445896187531}}" does _not_ have 
the prefix {{/testwebuiproxy/..}} \\ \\
All site-relative links (URL starting with {{"/"}}) should have been prepended 
with the uiRoot prefix {{/testwebuiproxy/..}} like this ...
{code:xml|borderColor=#0c0}
...
App IDApp 
Name...
  

  local-1445896187531Spark
 shell
  ...
{code}

  was:
Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
APPLICATION_WEB_PROXY_BASE=}}). This makes it 
impossible/unpractical to expose the *History Server* in a multi-tenancy 
environment where each Spark service instance has one history server behind a 
multi-tenant enabled proxy server.  All other Spark web UI pages are correctly 
prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.

*Repro steps:*\\
# Configure history log collection:
{code:title=conf/spark-defaults.conf|borderStyle=solid}
spark.eventLog.enabled true
spark.eventLog.dir logs/history
spark.history.fs.logDirectory  logs/history
{code}
...create the logs folders:
{code}
$ mkdir -p logs/history
{code}
# Start the Spark shell and run the word count example:
{code:java|borderStyle=solid}
$ bin/spark-shell
...
scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
1)).reduceByKey(_ + _).collect
scala> sc.stop
{code}
# Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}):
{code}
$ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
{code}
# Start the history server:
{code}
$  sbin/start-history-server.sh
{code}
# Bring up the History Server web UI at {{localhost:18080}} and view the 
application link in the HTML source text:
{code:xml|borderColor=#c00}
...
App IDApp 
Name...
  

  local-1445896187531Spark 
shell
  ...
{code}
*Notice*, application link "{{/history/local-1445896187531}}" does _not_ have 
the prefix {{/testwebuiproxy/..}} \\ \\
All site-relative links (URL starting with {{"/"}}) should have been prepended 
with the uiRoot prefix {{/testwebuiproxy/..}} like this ...
{code:xml|borderColor=#0c0}
...
App IDApp 
Name...
  

  local-1445896187531Spark
 shell
  ...
{code}


> HistoryPage not multi-tenancy enabled (app links not prefixed with 
> APPLICATION_WEB_PROXY_BASE)
> --
>
> Key: SPARK-11338
> URL: https://issues.apache.org/jira/browse/SPARK-11338
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Christian Kadner
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
> APPLICATION_WEB_PROXY_BASE=}}). This makes it 
> impossible/unpractical to expose the *History Server* in a multi-tenancy 
> environment where each Spark service instance has one history server behind a 
> multi-tenant enabled proxy server.  All other Spark web UI pages are 
> correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} environment 
> variable is set.
> *Repro steps:*\\
> # Configure history log collection:
> {code:title=conf/spark-defaults.conf|borderStyle=solid}
> spark.eventLog.enabled true
> spark.eventLog.dir logs/history
> 

[jira] [Commented] (SPARK-4836) Web UI should display separate information for all stage attempts

2015-10-27 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977551#comment-14977551
 ] 

Christian Kadner commented on SPARK-4836:
-

Hi [~joshrosen], is this still a problem? And if so, do you have a somewhat 
_"reliable"_ repro scenario or a nifty way to fake this: {quote}"...(job) lost 
some partitions of that stage and had to run a new stage attempt to recompute 
one or two tasks from that stage..."{quote}

> Web UI should display separate information for all stage attempts
> -
>
> Key: SPARK-4836
> URL: https://issues.apache.org/jira/browse/SPARK-4836
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.1.1, 1.2.0
>Reporter: Josh Rosen
>
> I've run into some cases where the web UI job page will say that a job took 
> 12 minutes but the sum of that job's stage times is something like 10 
> seconds.  In this case, it turns out that my job ran a stage to completion 
> (which took, say, 5 minutes) then lost some partitions of that stage and had 
> to run a new stage attempt to recompute one or two tasks from that stage.  As 
> a result, the latest attempt for that stage reports only one or two tasks.  
> In the web UI, it seems that we only show the latest stage attempt, not all 
> attempts, which can lead to confusing / misleading displays for jobs with 
> failed / partially-recomputed stages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)

2015-10-26 Thread Christian Kadner (JIRA)
Christian Kadner created SPARK-11338:


 Summary: HistoryPage not multi-tenancy enabled (app links not 
prefixed with APPLICATION_WEB_PROXY_BASE)
 Key: SPARK-11338
 URL: https://issues.apache.org/jira/browse/SPARK-11338
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Christian Kadner


Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
APPLICATION_WEB_PROXY_BASE=}}). This makes it 
impossible/unpractical to expose the *History Server* in a multi-tenancy 
environment where each Spark service instance has one history server behind a 
multi-tenant enabled proxy server.  All other Spark web UI pages are correctly 
prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.

*Repro steps:*\\
# Configure history log collection:
{code:title=conf/spark-defaults.conf|borderStyle=solid}
spark.eventLog.enabled true
spark.eventLog.dir logs/history
spark.history.fs.logDirectory  logs/history
{code}
...create the logs folders:
{code}
$ mkdir -p logs/history
{code}
# Start the Spark shell and run the word count example:
{code:java|borderStyle=solid}
$ bin/spark-shell
...
scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
1)).reduceByKey(_ + _).collect
scala> sc.stop
{code}
# Set the web proxy root path path ({{/testwebuiproxy/..}}):
{code}
$ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
{code}
# Start the history server:
{code}
$  sbin/start-history-server.sh
{code}
# Bring up the History Server web UI at {{localhost:18080}} and view the 
application link in the HTML source text:
{code:xml|borderColor=#c00}
...
App IDApp 
Name...
  

  local-1445896187531Spark 
shell
  ...
{code}
*Notice*, application link does _not_ have the prefix {{/testwebuiproxy/..}} \\ 
\\
All site-relative links (URL starting with {{"/"}}) should have been prepended 
with the uiRoot prefix {color:red}{{/testwebuiproxy/..}}{color} like this ...
{code:xml|borderColor=#0c0}
...
App IDApp 
Name...
  

  local-1445896187531Spark
 shell
  ...
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)

2015-10-26 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-11338:
-
Description: 
Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
APPLICATION_WEB_PROXY_BASE=}}). This makes it 
impossible/unpractical to expose the *History Server* in a multi-tenancy 
environment where each Spark service instance has one history server behind a 
multi-tenant enabled proxy server.  All other Spark web UI pages are correctly 
prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.

*Repro steps:*\\
# Configure history log collection:
{code:title=conf/spark-defaults.conf|borderStyle=solid}
spark.eventLog.enabled true
spark.eventLog.dir logs/history
spark.history.fs.logDirectory  logs/history
{code}
...create the logs folders:
{code}
$ mkdir -p logs/history
{code}
# Start the Spark shell and run the word count example:
{code:java|borderStyle=solid}
$ bin/spark-shell
...
scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
1)).reduceByKey(_ + _).collect
scala> sc.stop
{code}
# Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}):
{code}
$ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
{code}
# Start the history server:
{code}
$  sbin/start-history-server.sh
{code}
# Bring up the History Server web UI at {{localhost:18080}} and view the 
application link in the HTML source text:
{code:xml|borderColor=#c00}
...
App IDApp 
Name...
  

  local-1445896187531Spark 
shell
  ...
{code}
*Notice*, application link "{{/history/local-1445896187531}}" does _not_ have 
the prefix {{/testwebuiproxy/..}} \\ \\
All site-relative links (URL starting with {{"/"}}) should have been prepended 
with the uiRoot prefix {{/testwebuiproxy/..}} like this ...
{code:xml|borderColor=#0c0}
...
App IDApp 
Name...
  

  local-1445896187531Spark
 shell
  ...
{code}

  was:
Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
APPLICATION_WEB_PROXY_BASE=}}). This makes it 
impossible/unpractical to expose the *History Server* in a multi-tenancy 
environment where each Spark service instance has one history server behind a 
multi-tenant enabled proxy server.  All other Spark web UI pages are correctly 
prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.

*Repro steps:*\\
# Configure history log collection:
{code:title=conf/spark-defaults.conf|borderStyle=solid}
spark.eventLog.enabled true
spark.eventLog.dir logs/history
spark.history.fs.logDirectory  logs/history
{code}
...create the logs folders:
{code}
$ mkdir -p logs/history
{code}
# Start the Spark shell and run the word count example:
{code:java|borderStyle=solid}
$ bin/spark-shell
...
scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
1)).reduceByKey(_ + _).collect
scala> sc.stop
{code}
# Set the web proxy root path path ({{/testwebuiproxy/..}}):
{code}
$ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
{code}
# Start the history server:
{code}
$  sbin/start-history-server.sh
{code}
# Bring up the History Server web UI at {{localhost:18080}} and view the 
application link in the HTML source text:
{code:xml|borderColor=#c00}
...
App IDApp 
Name...
  

  local-1445896187531Spark 
shell
  ...
{code}
*Notice*, application link does _not_ have the prefix {{/testwebuiproxy/..}} \\ 
\\
All site-relative links (URL starting with {{"/"}}) should have been prepended 
with the uiRoot prefix {color:red}{{/testwebuiproxy/..}}{color} like this ...
{code:xml|borderColor=#0c0}
...
App IDApp 
Name...
  

  local-1445896187531Spark
 shell
  ...
{code}


> HistoryPage not multi-tenancy enabled (app links not prefixed with 
> APPLICATION_WEB_PROXY_BASE)
> --
>
> Key: SPARK-11338
> URL: https://issues.apache.org/jira/browse/SPARK-11338
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Christian Kadner
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
> APPLICATION_WEB_PROXY_BASE=}}). This makes it 
> impossible/unpractical to expose the *History Server* in a multi-tenancy 
> environment where each Spark service instance has one history server behind a 
> multi-tenant enabled proxy server.  All other Spark web UI pages are 
> correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.
> *Repro steps:*\\
> # Configure history log collection:
> {code:title=conf/spark-defaults.conf|borderStyle=solid}
> spark.eventLog.enabled true
> spark.eventLog.dir logs/history
> spark.history.fs.logDirectory  logs/history
> {code}
> 

[jira] [Created] (SPARK-9211) HiveComparisonTest generates incorrect file name for golden answer files on Windows

2015-07-21 Thread Christian Kadner (JIRA)
Christian Kadner created SPARK-9211:
---

 Summary: HiveComparisonTest generates incorrect file name for 
golden answer files on Windows
 Key: SPARK-9211
 URL: https://issues.apache.org/jira/browse/SPARK-9211
 Project: Spark
  Issue Type: Test
  Components: SQL, Windows
Affects Versions: 1.4.1
 Environment: Windows
Reporter: Christian Kadner
Priority: Minor


The names of the golden answer files for the Hive test cases (test suites based 
on {{HiveComparisonTest}}) are generated using an MD5 hash of the query text. 
When the query text contains line breaks then the generated MD5 hash differs 
between Windows and Linux/OSX ({{\r\n}} vs {{\n}}).

This results in erroneously created golden answer files from just running a 
Hive comparison test and makes it impossible to modify or add new test cases 
with correctly named golden answer files on Windows.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-7265:

Labels: spark.tc  (was: )

 Improving documentation for Spark SQL Hive support 
 ---

 Key: SPARK-7265
 URL: https://issues.apache.org/jira/browse/SPARK-7265
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.3.1
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Trivial
  Labels: spark.tc
 Fix For: 1.5.0


 miscellaneous documentation improvement for Spark SQL Hive support, Yarn 
 cluster deployment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-2859:

Labels: spark.tc  (was: )

 Update url of Kryo project in related docs
 --

 Key: SPARK-2859
 URL: https://issues.apache.org/jira/browse/SPARK-2859
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Guancheng Chen
Assignee: Guancheng Chen
Priority: Trivial
  Labels: spark.tc
 Fix For: 1.0.3, 1.1.0


 Kryo project has been migrated from googlecode to github, hence we need to 
 update its URL in related docs such as tuning.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-8639:

Labels: spark.tc  (was: )

 Instructions for executing jekyll in docs/README.md could be slightly more 
 clear, typo in docs/api.md
 -

 Key: SPARK-8639
 URL: https://issues.apache.org/jira/browse/SPARK-8639
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Rosstin Murphy
Assignee: Rosstin Murphy
Priority: Trivial
  Labels: spark.tc
 Fix For: 1.4.1, 1.5.0


 In docs/README.md, the text states around line 31
 Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll 
 will create a directory called '_site' containing index.html as well as the 
 rest of the compiled files.
 It might be more clear if we said
 Execute 'jekyll build' from the 'docs/' directory to compile the site. 
 Compiling the site with Jekyll will create a directory called '_site' 
 containing index.html as well as the rest of the compiled files.
 In docs/api.md: Here you can API docs for Spark and its submodules.
 should be something like: Here you can read API docs for Spark and its 
 submodules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5562) LDA should handle empty documents

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-5562:

Labels: spark.tc  (was: starter)

 LDA should handle empty documents
 -

 Key: SPARK-5562
 URL: https://issues.apache.org/jira/browse/SPARK-5562
 Project: Spark
  Issue Type: Test
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Alok Singh
Priority: Minor
  Labels: spark.tc, starter
 Fix For: 1.5.0

   Original Estimate: 96h
  Remaining Estimate: 96h

 Latent Dirichlet Allocation (LDA) could easily be given empty documents when 
 people select a small vocabulary.  We should check to make sure it is robust 
 to empty documents.
 This will hopefully take the form of a unit test, but may require modifying 
 the LDA implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7357) Improving HBaseTest example

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-7357:

Labels: spark.tc  (was: )

 Improving HBaseTest example
 ---

 Key: SPARK-7357
 URL: https://issues.apache.org/jira/browse/SPARK-7357
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 1.3.1
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Minor
  Labels: spark.tc
 Fix For: 1.5.0

   Original Estimate: 2m
  Remaining Estimate: 2m

 Minor improvement to HBaseTest example, when Hbase related configurations 
 e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are 
 not set to default (localhost:2181), connection to zookeeper might hang as 
 shown in following stack
 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=xxx.xxx.xxx:2181 sessionTimeout=9 
 watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to 
 server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown 
 error)
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to 
 xxx.xxx.xxx/9.30.94.121:2181, initiating session
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete 
 on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, 
 negotiated timeout = 4
 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper 
 is null
 this is due to hbase-site.xml is not placed on spark class path. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-6785:

Labels: spark.tc  (was: )

 DateUtils can not handle date before 1970/01/01 correctly
 -

 Key: SPARK-6785
 URL: https://issues.apache.org/jira/browse/SPARK-6785
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Christian Kadner
  Labels: spark.tc
 Fix For: 1.5.0


 {code}
 scala val d = new Date(100)
 d: java.sql.Date = 1969-12-31
 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d))
 res1: java.sql.Date = 1970-01-01
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)

2015-07-02 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612584#comment-14612584
 ] 

Christian Kadner commented on SPARK-8746:
-

Thank you Sean!

 Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)
 --

 Key: SPARK-8746
 URL: https://issues.apache.org/jira/browse/SPARK-8746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Christian Kadner
Assignee: Christian Kadner
Priority: Trivial
  Labels: documentation, test
 Fix For: 1.4.1, 1.5.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) 
 describes how to generate golden answer files for new hive comparison test 
 cases. However the download link for the Hive 0.13.1 jars points to 
 https://hive.apache.org/downloads.html but none of the linked mirror sites 
 still has the 0.13.1 version.
 We need to update the link to 
 https://archive.apache.org/dist/hive/hive-0.13.1/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)

2015-06-30 Thread Christian Kadner (JIRA)
Christian Kadner created SPARK-8746:
---

 Summary: Need to update download link for Hive 0.13.1 jars 
(HiveComparisonTest)
 Key: SPARK-8746
 URL: https://issues.apache.org/jira/browse/SPARK-8746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Christian Kadner
Priority: Trivial


The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) 
describes how to generate golden answer files for new hive comparison test 
cases. However the download link for the Hive 0.13.1 jars points to 
https://hive.apache.org/downloads.html but none of the linked mirror sites 
still has the 0.13.1 version.

We need to update the link to https://archive.apache.org/dist/hive/hive-0.13.1/




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly

2015-06-30 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549366#comment-14549366
 ] 

Christian Kadner edited comment on SPARK-6785 at 6/30/15 9:50 PM:
--

{panel:borderStyle=dashed|borderColor=#ccc|bgColor=#CE}
Pull Request +[6242|https://github.com/apache/spark/pull/6242]+
{panel}
\\
Before my fix, the from-and-to Java date conversion of dates before 1970 will 
only work for {{java.sql.Date}} objects that reflect a date and time exactly at 
midnight in the System's local time zone. 
Otherwise, if the Date's time is just one millisecond before or after midnight, 
the result of the above conversion will be offset by one day for Dates before 
1970 because of a rounding (truncation) flaw in the function 
{{DateUtils.millisToDays(Long):Int}}

\\

{code}
  scala val df = new SimpleDateFormat(-MM-dd HH:mm:ss)
  df: java.text.SimpleDateFormat = -MM-dd HH:mm:ss

  scala val d1 = new Date(df.parse(1969-01-01 00:00:00).getTime)
  d2: java.sql.Date = 1969-01-01

  scala val d2 = new Date(df.parse(1969-01-01 00:00:01).getTime)
  d2: java.sql.Date = 1969-01-01

  scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d1))
  res1: java.sql.Date = 1969-01-01

  scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d2))
  res2: java.sql.Date = 1969-01-02
{code}

\\

What is the code doing and how to fix it:

\\

 - A {{java.util.Date}} is represented by milliseconds ({{Long}}) since the 
Epoch (1970/01/01 0:00:00 GMT) with positive numbers for dates after and 
negative numbers for dates before 1970
 
 - The function {{DateUtils.fromJavaDate(java.util.Date):Int}} calculates the 
number of full days passed since 1970/01/01 00:00:00 (local time, not UTC), but 
by using the data type {{Long}} (as opposed to {{Double}}) when  converting 
milliseconds to days it essentially truncates the fractional part of days 
passed (disregarding the impact of hours, minutes, seconds)
 
 - The function {{DateUtils.toJavaDate(Int):Date}} converts the given number of 
days into milliseconds and adds it 1970/01/01 00:00:00 (local time, not UTC)

 - _Side note: The time-zone offset from UTC is factored in when converting a 
Date to days and removed when converting days to Date, so the time-zone 
shifting is neutralized in the round-trip conversion 
{{toJavaDate(fromJavaDate(java.util.Date))}}._
 
 - The truncation of partial days is not a problem for dates after 1970 since 
adding a fraction of a day to any date will not flip the calendar to the next 
day (since all our Dates start 0:00:00 AM)
 
 - That truncation of partial days however is a problem when subtracting even a 
second from a {{Date}} with time at 0:00:00 AM which should turn the calender 
back one day to the previous date
 
 - Ideally the date conversion should be done using milliseconds, but since 
using days has been established already, the fix is to work with {{Double}} to 
preserve fractions of days and use {{floor()}} instead of the implicit truncate 
to round to a full number of days ({{Int}})

\\

Pseudo-code example, adding or subtracting 1 hour to Date 1970/01/01 0:00:00 
using milliseconds...

{code}
1970-01-01 0:00:00 + 1 hr = 1970-01-01  1:00:00
1970-01-01 0:00:00 - 1 hr = 1969-12-31 23:00:00
{code}

\\

Same example, using full days. One hour is about 0.04 days. Using {{trunc()}} 
versus {{floor()}} we get ...  

{code}
trunc(+0.04) = +0  --  1970-01-01 + 0 days = 1970-01-01(correct)
floor(+0.04) = +0  --  1970-01-01 + 0 days = 1970-01-01(correct)

trunc(-0.04) = -0  --  1970-01-01 + -0 days = 1970-01-01   (incorrect, bug)
floor(-0.04) = -1  --  1970-01-01 + -1 day  = 1969-12-31   (correct, fix)
{code}

{code} 
def trunc(d: Dounble): Int = d.toInt
{code}


was (Author: ckadner):
{panel:borderStyle=dashed|borderColor=#ccc|bgColor=#CE}
Please review only my second Pull Request 
+[6242|https://github.com/apache/spark/pull/6242]+ and ignore my first Pull 
Request -[6236|https://github.com/apache/spark/pull/6236]-
Thank you!
{panel}
\\
Before my fix, the from-and-to Java date conversion of dates before 1970 will 
only work for {{java.sql.Date}} objects that reflect a date and time exactly at 
midnight in the System's local time zone. 
Otherwise, if the Date's time is just one millisecond before or after midnight, 
the result of the above conversion will be offset by one day for Dates before 
1970 because of a rounding (truncation) flaw in the function 
{{DateUtils.millisToDays(Long):Int}}

\\

{code}
  scala val df = new SimpleDateFormat(-MM-dd HH:mm:ss)
  df: java.text.SimpleDateFormat = -MM-dd HH:mm:ss

  scala val d1 = new Date(df.parse(1969-01-01 00:00:00).getTime)
  d2: java.sql.Date = 1969-01-01

  scala val d2 = new Date(df.parse(1969-01-01 00:00:01).getTime)
  d2: java.sql.Date = 1969-01-01

  scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d1))
  res1: 

[jira] [Comment Edited] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly

2015-05-18 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549366#comment-14549366
 ] 

Christian Kadner edited comment on SPARK-6785 at 5/18/15 11:21 PM:
---

{panel:borderStyle=dashed|borderColor=#ccc|bgColor=#CE}
Please review only my second Pull Request 
+[6242|https://github.com/apache/spark/pull/6242]+ and ignore my first Pull 
Request -[6236|https://github.com/apache/spark/pull/6236]-
Thank you!
{panel}
\\
Before my fix, the from-and-to Java date conversion of dates before 1970 will 
only work for {{java.sql.Date}} objects that reflect a date and time exactly at 
midnight in the System's local time zone. 
Otherwise, if the Date's time is just one millisecond before or after midnight, 
the result of the above conversion will be offset by one day for Dates before 
1970 because of a rounding (truncation) flaw in the function 
{{DateUtils.millisToDays(Long):Int}}

\\

{code}
  scala val df = new SimpleDateFormat(-MM-dd HH:mm:ss)
  df: java.text.SimpleDateFormat = -MM-dd HH:mm:ss

  scala val d1 = new Date(df.parse(1969-01-01 00:00:00).getTime)
  d2: java.sql.Date = 1969-01-01

  scala val d2 = new Date(df.parse(1969-01-01 00:00:01).getTime)
  d2: java.sql.Date = 1969-01-01

  scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d1))
  res1: java.sql.Date = 1969-01-01

  scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d2))
  res2: java.sql.Date = 1969-01-02
{code}

\\

What is the code doing and how to fix it:

\\

 - A {{java.util.Date}} is represented by milliseconds ({{Long}}) since the 
Epoch (1970/01/01 0:00:00 GMT) with positive numbers for dates after and 
negative numbers for dates before 1970
 
 - The function {{DateUtils.fromJavaDate(java.util.Date):Int}} calculates the 
number of full days passed since 1970/01/01 00:00:00 (local time, not UTC), but 
by using the data type {{Long}} (as opposed to {{Double}}) when  converting 
milliseconds to days it essentially truncates the fractional part of days 
passed (disregarding the impact of hours, minutes, seconds)
 
 - The function {{DateUtils.toJavaDate(Int):Date}} converts the given number of 
days into milliseconds and adds it 1970/01/01 00:00:00 (local time, not UTC)

 - _Side note: The time-zone offset from UTC is factored in when converting a 
Date to days and removed when converting days to Date, so the time-zone 
shifting is neutralized in the round-trip conversion 
{{toJavaDate(fromJavaDate(java.util.Date))}}._
 
 - The truncation of partial days is not a problem for dates after 1970 since 
adding a fraction of a day to any date will not flip the calendar to the next 
day (since all our Dates start 0:00:00 AM)
 
 - That truncation of partial days however is a problem when subtracting even a 
second from a {{Date}} with time at 0:00:00 AM which should turn the calender 
back one day to the previous date
 
 - Ideally the date conversion should be done using milliseconds, but since 
using days has been established already, the fix is to work with {{Double}} to 
preserve fractions of days and use {{floor()}} instead of the implicit truncate 
to round to a full number of days ({{Int}})

\\

Pseudo-code example, adding or subtracting 1 hour to Date 1970/01/01 0:00:00 
using milliseconds...

{code}
1970-01-01 0:00:00 + 1 hr = 1970-01-01  1:00:00
1970-01-01 0:00:00 - 1 hr = 1969-12-31 23:00:00
{code}

\\

Same example, using full days. One hour is about 0.04 days. Using {{trunc()}} 
versus {{floor()}} we get ...  

{code}
trunc(+0.04) = +0  --  1970-01-01 + 0 days = 1970-01-01(correct)
floor(+0.04) = +0  --  1970-01-01 + 0 days = 1970-01-01(correct)

trunc(-0.04) = -0  --  1970-01-01 + -0 days = 1970-01-01   (incorrect, bug)
floor(-0.04) = -1  --  1970-01-01 + -1 day  = 1969-12-31   (correct, fix)
{code}

{code} 
def trunc(d: Dounble): Int = d.toInt
{code}



 DateUtils can not handle date before 1970/01/01 correctly
 -

 Key: SPARK-6785
 URL: https://issues.apache.org/jira/browse/SPARK-6785
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu

 {code}
 scala val d = new Date(100)
 d: java.sql.Date = 1969-12-31
 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d))
 res1: java.sql.Date = 1970-01-01
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly

2015-05-13 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542697#comment-14542697
 ] 

Christian Kadner commented on SPARK-6785:
-

Hi Patrick,
I would like to work on this issue. Seems like the date conversion is thrown 
off by the time-zone adjustments and the fact that the interchange type is days 
instead of millis. I am preparing a pull-request which will also include test 
cases to cover more date conversion scenarios.



 DateUtils can not handle date before 1970/01/01 correctly
 -

 Key: SPARK-6785
 URL: https://issues.apache.org/jira/browse/SPARK-6785
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu

 {code}
 scala val d = new Date(100)
 d: java.sql.Date = 1969-12-31
 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d))
 res1: java.sql.Date = 1970-01-01
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540217#comment-14540217
 ] 

Christian Kadner edited comment on SPARK-4128 at 5/12/15 5:04 PM:
--

Hi Sean,

while there is still a section covering the IntelliJ setup, what is missing are 
these steps (or an updated version of it) which have to be taken in order to 
get a successfully Make of the project. I needed to do some version of it for 
1.3.0, 1.3.1, 1.4.0.

part of Patrick's deleted paragraph - start
...
At the top of the leftmost pane, make sure the Project/Packages selector 
is set to Packages.
Right click on any package and click “Open Module Settings” - you will be 
able to modify any of the modules here.
A few of the modules need to be modified slightly from the default import.
Add sources to the following modules: Under “Sources” tab add a source 
on the right. 
spark-hive: add v0.13.1/src/main/scala
spark-hive-thriftserver v0.13.1/src/main/scala
spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala
For spark-yarn click “Add content root” and navigate in the filesystem 
to yarn/common directory of Spark
...
part of Patrick's deleted paragraph - end


I suggest to add an updated version of that to the wiki, since some of the 
Modules are setup in a way that similar non-obvious manual steps are required 
to make them compile.


was (Author: ckadner):
Hi Sean,

while there is still a section covering the IntelliJ setup, what is missing are 
these steps, or an updated version of it, which I had to do for 1.3.0, 1.3.1, 
1.4.0 in order to get a successfully Make of the project.

part of Patrick's deleted paragraph - start
...
At the top of the leftmost pane, make sure the Project/Packages selector 
is set to Packages.
Right click on any package and click “Open Module Settings” - you will be 
able to modify any of the modules here.
A few of the modules need to be modified slightly from the default import.
Add sources to the following modules: Under “Sources” tab add a source 
on the right. 
spark-hive: add v0.13.1/src/main/scala
spark-hive-thriftserver v0.13.1/src/main/scala
spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala
For spark-yarn click “Add content root” and navigate in the filesystem 
to yarn/common directory of Spark
...
part of Patrick's deleted paragraph - end


I suggest to add an updated version of that to the wiki, since some of the 
Modules are setup in a way that similar non-obvious manual steps are required 
to make them compile.

 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540217#comment-14540217
 ] 

Christian Kadner commented on SPARK-4128:
-

Hi Sean,

while there is still a section covering the IntelliJ setup, what is missing are 
these steps, or an updated version of it, which I had to do for 1.3.0, 1.3.1, 
1.4.0 in order to get a successfully Make of the project.

part of Patrick's deleted paragraph - start
...
At the top of the leftmost pane, make sure the Project/Packages selector 
is set to Packages.
Right click on any package and click “Open Module Settings” - you will be 
able to modify any of the modules here.
A few of the modules need to be modified slightly from the default import.
Add sources to the following modules: Under “Sources” tab add a source 
on the right. 
spark-hive: add v0.13.1/src/main/scala
spark-hive-thriftserver v0.13.1/src/main/scala
spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala
For spark-yarn click “Add content root” and navigate in the filesystem 
to yarn/common directory of Spark
part of Patrick's deleted paragraph - end


I suggest to add an updated version of that to the wiki, since some of the 
Modules are setup in a way that similar non-obvious manual steps are required 
to make them compile.

 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540217#comment-14540217
 ] 

Christian Kadner edited comment on SPARK-4128 at 5/12/15 5:02 PM:
--

Hi Sean,

while there is still a section covering the IntelliJ setup, what is missing are 
these steps, or an updated version of it, which I had to do for 1.3.0, 1.3.1, 
1.4.0 in order to get a successfully Make of the project.

part of Patrick's deleted paragraph - start
...
At the top of the leftmost pane, make sure the Project/Packages selector 
is set to Packages.
Right click on any package and click “Open Module Settings” - you will be 
able to modify any of the modules here.
A few of the modules need to be modified slightly from the default import.
Add sources to the following modules: Under “Sources” tab add a source 
on the right. 
spark-hive: add v0.13.1/src/main/scala
spark-hive-thriftserver v0.13.1/src/main/scala
spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala
For spark-yarn click “Add content root” and navigate in the filesystem 
to yarn/common directory of Spark
...
part of Patrick's deleted paragraph - end


I suggest to add an updated version of that to the wiki, since some of the 
Modules are setup in a way that similar non-obvious manual steps are required 
to make them compile.


was (Author: ckadner):
Hi Sean,

while there is still a section covering the IntelliJ setup, what is missing are 
these steps, or an updated version of it, which I had to do for 1.3.0, 1.3.1, 
1.4.0 in order to get a successfully Make of the project.

part of Patrick's deleted paragraph - start
...
At the top of the leftmost pane, make sure the Project/Packages selector 
is set to Packages.
Right click on any package and click “Open Module Settings” - you will be 
able to modify any of the modules here.
A few of the modules need to be modified slightly from the default import.
Add sources to the following modules: Under “Sources” tab add a source 
on the right. 
spark-hive: add v0.13.1/src/main/scala
spark-hive-thriftserver v0.13.1/src/main/scala
spark-repl: scala-2.10/src/main/scala and scala-2.10/src/test/scala
For spark-yarn click “Add content root” and navigate in the filesystem 
to yarn/common directory of Spark
part of Patrick's deleted paragraph - end


I suggest to add an updated version of that to the wiki, since some of the 
Modules are setup in a way that similar non-obvious manual steps are required 
to make them compile.

 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540502#comment-14540502
 ] 

Christian Kadner edited comment on SPARK-4128 at 5/12/15 7:11 PM:
--

Yes, I encountered these compile problems after a fresh import of the Spark 
1.3.0 and 1.3.1 project from download (.tgz) and 1.4 when loaded from a Git 
repository.

For Scala 2.10/2.11 support, I suppose either one should be chosen by default 
without having to run a script. Btw, that should be doc'd as well ;-)


was (Author: ckadner):
Yes, I encountered these compile problems after a fresh import of the Spark 1.4 
project both when downloaded (tar/zip) and when loaded from a Git repository.

For Scala 2.10/2.11 support, I suppose either one should be chosen by default 
without having to run a script. Btw, that should be doc'd as well ;-)

 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540397#comment-14540397
 ] 

Christian Kadner commented on SPARK-4128:
-

Not every user may care about each of the modules, and yes, these instructions 
may need to be revised.

Yet I strongly think there should be some general text, maybe under Other 
Tips, that explains the need to manually update the Module settings to mark 
additional folders as Source folders (after selecting the right combination of 
Profiles and doing a Generate Sources 

For spark-hive this seems to still be true.

Patrick had written this comment in one of his emails, which are helpful to 
understand why that needs to be done.

 In some cases in the maven build we now have pluggable source
 directories based on profiles using the maven build helper plug-in.
 This is necessary to support cross building against different Hive
 versions, and there will be additional instances of this due to
 supporting scala 2.11 and 2.10.

 In these cases, you may need to add source locations explicitly to
 intellij if you want the entire project to compile there.

 Unfortunately as long as we support cross-building like this, it will
 be an issue. Intellij's maven support does not correctly detect our
 use of the maven-build-plugin to add source directories.

Besides fixing the module settings for spark-hive, I had to change the 
flume-sink module settings to mark 
target\scala-2.10\src_managed\main\compiled_avro folder as additional Source 
Folder.



 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540397#comment-14540397
 ] 

Christian Kadner edited comment on SPARK-4128 at 5/12/15 6:23 PM:
--

Not every user may care about each of the modules, and yes, these instructions 
may need to be revised.

Yet I strongly think there should be some general text, maybe under Other 
Tips, that explains the need to manually update the Module settings to mark 
additional folders as Source folders (after selecting the right combination of 
Profiles and doing a Generate Sources 

For spark-hive this seems to still be true.

Patrick had written this comment in one of his emails, which is helpful to 
understand why that needs to be done.

 In some cases in the maven build we now have pluggable source
 directories based on profiles using the maven build helper plug-in.
 This is necessary to support cross building against different Hive
 versions, and there will be additional instances of this due to
 supporting scala 2.11 and 2.10.

 In these cases, you may need to add source locations explicitly to
 intellij if you want the entire project to compile there.

 Unfortunately as long as we support cross-building like this, it will
 be an issue. Intellij's maven support does not correctly detect our
 use of the maven-build-plugin to add source directories.

Besides fixing the module settings for spark-hive, I had to change the 
flume-sink module settings to mark 
target\scala-2.10\src_managed\main\compiled_avro folder as additional Source 
Folder.




was (Author: ckadner):
Not every user may care about each of the modules, and yes, these instructions 
may need to be revised.

Yet I strongly think there should be some general text, maybe under Other 
Tips, that explains the need to manually update the Module settings to mark 
additional folders as Source folders (after selecting the right combination of 
Profiles and doing a Generate Sources 

For spark-hive this seems to still be true.

Patrick had written this comment in one of his emails, which are helpful to 
understand why that needs to be done.

 In some cases in the maven build we now have pluggable source
 directories based on profiles using the maven build helper plug-in.
 This is necessary to support cross building against different Hive
 versions, and there will be additional instances of this due to
 supporting scala 2.11 and 2.10.

 In these cases, you may need to add source locations explicitly to
 intellij if you want the entire project to compile there.

 Unfortunately as long as we support cross-building like this, it will
 be an issue. Intellij's maven support does not correctly detect our
 use of the maven-build-plugin to add source directories.

Besides fixing the module settings for spark-hive, I had to change the 
flume-sink module settings to mark 
target\scala-2.10\src_managed\main\compiled_avro folder as additional Source 
Folder.



 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540502#comment-14540502
 ] 

Christian Kadner commented on SPARK-4128:
-

Yes, I encountered these compile problems after a fresh import of the Spark 1.4 
project both when downloaded (tar/zip) and when loaded from a Git repository.

For Scala 2.10/2.11 support, I suppose either one should be chosen by default 
without having to run a script. Btw, that should be doc'd as well ;-)

 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540727#comment-14540727
 ] 

Christian Kadner commented on SPARK-4128:
-

Hi Sean, 
based on what Patrick described, I would propose this text under IDE Setup  
IntelliJ  Other Tips

!-- start --

Some of the modules have pluggable source directories based on Maven 
profiles (i.e. to support both Scala 2.11 and 2.10 or to allow cross building 
against different versions of Hive). In some cases IntelliJ's does not 
correctly detect our use of the maven-build-plugin to add source directories. 
In these cases, you may need to add source locations explicitly to compile the 
entire project.

- open the Project Settings and select Modules 
- based on your selected Maven profiles, you may need to add source 
folders to the following modules:
spark-hive: add v0.13.1/src/main/scala
spark-streaming-flume-sink: add 
target\scala-2.10\src_managed\main\compiled_avro

!-- end --

In addition we could quote the compilation errors, so other developers will 
find this solution when they use search the web to trouble shoot these issues.


 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-12 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540776#comment-14540776
 ] 

Christian Kadner commented on SPARK-4128:
-

Thank you Sean!

 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-11 Thread Christian Kadner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539122#comment-14539122
 ] 

Christian Kadner commented on SPARK-4128:
-

Hi Patrick, 
I recently set up my IntelliJ IDEA development environment for Apache Spark and 
I struggled with a few of the same/similar compilation errors that were 
described in this email thread 
https://www.mail-archive.com/dev@spark.apache.org/msg06070.html.

You had added a helpful paragraph to the wiki on Nov 20, 2014 but you removed 
it again on Jan 9, 2015. Did you find a better solution or a more pertinent 
place to put this information?

 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org