[jira] [Updated] (SPARK-4331) SBT Scalastyle doesn't work for the sources under hive's v0.12.0 and v0.13.1

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4331:

Target Version/s: 1.4.0  (was: 1.3.0)

 SBT Scalastyle doesn't work for the sources under hive's v0.12.0 and v0.13.1
 

 Key: SPARK-4331
 URL: https://issues.apache.org/jira/browse/SPARK-4331
 Project: Spark
  Issue Type: Bug
  Components: Build, SQL
Affects Versions: 1.3.0
Reporter: Kousuke Saruta

 v0.13.1 and v0.12.0 is not standard directory structure for sbt's sclastyle 
 plugin so scalastyle doesn't work for sources under those directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4119) Don't rely on HIVE_DEV_HOME to find .q files

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4119:

Target Version/s: 1.4.0  (was: 1.3.0)

 Don't rely on HIVE_DEV_HOME to find .q files
 

 Key: SPARK-4119
 URL: https://issues.apache.org/jira/browse/SPARK-4119
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 1.1.1
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Minor

 After merging in Hive 0.13.1 support, a bunch of .q files and golden answer 
 files got updated. Unfortunately, some .q were updated in Hive. For example, 
 an ORDER BY clause was added to groupby1_limit.q for bug fix.
 With HIVE_DEV_HOME set, developers working on Hive 0.12.0 may end up with 
 false test failures. Because .q files are looked up from HIVE_DEV_HOME and 
 outdated .q files are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2472) Spark SQL Thrift server sometimes assigns wrong job group name

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2472:

Target Version/s: 1.4.0  (was: 1.3.0)

 Spark SQL Thrift server sometimes assigns wrong job group name
 --

 Key: SPARK-2472
 URL: https://issues.apache.org/jira/browse/SPARK-2472
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Cheng Lian
Priority: Minor

 Sample beeline session used to reproduce this issue:
 {code}
 0: jdbc:hive2://localhost:1 drop table test;
 +-+
 | result  |
 +-+
 +-+
 No rows selected (0.614 seconds)
 0: jdbc:hive2://localhost:1 create table hive_table_copy as select * 
 from hive_table;
 +--++
 | key  | value  |
 +--++
 +--++
 No rows selected (0.493 seconds)
 0
 {code}
 The second statement results in two stages, the first stage is labeled with 
 the first {{drop table}} statement rather than the CTAS statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5165) Add support for rollup and cube in sqlcontext

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-5165:

Target Version/s: 1.4.0  (was: 1.3.0)

 Add support for rollup and cube in sqlcontext
 -

 Key: SPARK-5165
 URL: https://issues.apache.org/jira/browse/SPARK-5165
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 1.2.0
Reporter: Fei Wang

 Add support for rollup and cube in sqlcontext



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4760) ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size for tables created from Parquet files

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4760:

Target Version/s: 1.4.0  (was: 1.3.0)

 ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size 
 for tables created from Parquet files
 --

 Key: SPARK-4760
 URL: https://issues.apache.org/jira/browse/SPARK-4760
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Jianshi Huang
Priority: Critical

 In a older Spark version built around Oct. 12, I was able to use 
   ANALYZE TABLE table COMPUTE STATISTICS noscan
 to get estimated table size, which is important for optimizing joins. (I'm 
 joining 15 small dimension tables, and this is crucial to me).
 In the more recent Spark builds, it fails to estimate the table size unless I 
 remove noscan.
 Here's the statistics I got using DESC EXTENDED:
 old:
 parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1417763591, totalSize=56166}
 new:
 parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892, 
 COLUMN_STATS_ACCURATE=false, totalSize=0, numRows=-1, rawDataSize=-1}
 And I've tried turning off spark.sql.hive.convertMetastoreParquet in my 
 spark-defaults.conf and the result is unaffected (in both versions).
 Looks like the Parquet support in new Hive (0.13.1) is broken?
 Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5295) Stabilize data types

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-5295:

Target Version/s: 1.4.0  (was: 1.3.0)

 Stabilize data types
 

 Key: SPARK-5295
 URL: https://issues.apache.org/jira/browse/SPARK-5295
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Yin Huai

 1. We expose all the stuff in data types right now, including NumericTypes, 
 etc. These should be hidden from users. We should only expose the leaf types.
 2. Remove DeveloperAPI tag from the common types.
 3. Specify the internal type, external scala type, and external java type for 
 each data type.
 4. Add conversion functions between internal type, external scala type, and 
 external java type into each type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5100) Spark Thrift server monitor page

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-5100:

Target Version/s: 1.4.0  (was: 1.3.0)

 Spark Thrift server monitor page
 

 Key: SPARK-5100
 URL: https://issues.apache.org/jira/browse/SPARK-5100
 Project: Spark
  Issue Type: New Feature
  Components: SQL, Web UI
Reporter: Yi Tian
Priority: Critical
 Attachments: Spark Thrift-server monitor page.pdf, 
 prototype-screenshot.png


 In the latest Spark release, there is a Spark Streaming tab on the driver web 
 UI, which shows information about running streaming application. It should be 
 helpful for providing a monitor page in Thrift server, because both streaming 
 and Thrift server are long-term applications, and the details of the 
 application do not show on stage page or job page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4852) Hive query plan deserialization failure caused by shaded hive-exec jar file when generating golden answers

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4852:

Target Version/s: 1.4.0  (was: 1.3.0, 1.2.1)

 Hive query plan deserialization failure caused by shaded hive-exec jar file 
 when generating golden answers
 --

 Key: SPARK-4852
 URL: https://issues.apache.org/jira/browse/SPARK-4852
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Cheng Lian
Priority: Minor

 When adding Hive 0.13.1 support for Spark SQL Thrift server in PR 
 [2685|https://github.com/apache/spark/pull/2685], Kryo 2.22 used by original 
 hive-exec-0.13.1.jar was shaded by Kryo 2.21 used by Spark SQL because of 
 dependency hell. Unfortunately, Kryo 2.21 has a known bug that may cause Hive 
 query plan deserialization failure. This bug was fixed in Kryo 2.22.
 Normally, this issue doesn't affect Spark SQL because we don't even generate 
 Hive query plan. But when running Hive test suites like 
 {{HiveCompatibilitySuite}}, golden answer files must be generated by Hive, 
 and thus triggers this issue. A workaround is to replace 
 {{hive-exec-0.13.1.jar}} under {{$HIVE_HOME/lib}} with Spark's 
 {{hive-exec-0.13.1a.jar}} and {{kryo-2.21.jar}} under 
 {{$SPARK_DEV_HOME/lib_managed/jars}}. Then add {{$HIVE_HOME/lib}} to 
 {{$HADOOP_CLASSPATH}}.
 Upgrading to some newer version of Kryo which is binary compatible with Kryo 
 2.22 (if there is one) may fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4476) Use MapType for dict in json which has unique keys in each row.

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4476:

Target Version/s: 1.4.0  (was: 1.3.0)

 Use MapType for dict in json which has unique keys in each row.
 ---

 Key: SPARK-4476
 URL: https://issues.apache.org/jira/browse/SPARK-4476
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Davies Liu
Priority: Critical

 For the jsonRDD like this: 
 {code}
  {a: 1} 
  {b: 2} 
  {c: 3} 
  {d: 4} 
  {e: 5} 
 {code}
 It will create a StructType with 5 fileds in it, each field come from a 
 different row. It will be a problem if the RDD is large. A StructType with 
 thousands or millions fields is hard to play with (will cause stack overflow 
 during serialization).
 It should be MapType for this case. We need a clear rule to decide StructType 
 or MapType will be used for dict in json data. 
 cc [~yhuai] [~marmbrus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4176) Support decimals with precision 18 in Parquet

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4176:

Target Version/s: 1.4.0  (was: 1.3.0)

 Support decimals with precision  18 in Parquet
 ---

 Key: SPARK-4176
 URL: https://issues.apache.org/jira/browse/SPARK-4176
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Matei Zaharia

 After https://issues.apache.org/jira/browse/SPARK-3929, only decimals with 
 precisions = 18 (that can be read into a Long) will be readable from 
 Parquet, so we still need more work to support these larger ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4801) Add CTE capability to HiveContext

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4801:

Target Version/s: 1.4.0  (was: 1.3.0)

 Add CTE capability to HiveContext
 -

 Key: SPARK-4801
 URL: https://issues.apache.org/jira/browse/SPARK-4801
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Jacob Davis

 This is a request to add CTE functionality to HiveContext.  Common Table 
 Expressions are added in Hive 0.13.0 with HIVE-1180.  Using CTE style syntax 
 within HiveContext currently results in the following caused by message:
 {code}
 Caused by: scala.MatchError: TOK_CTE (of class 
 org.apache.hadoop.hive.ql.parse.ASTNode)
 at org.apache.spark.sql.hive.HiveQl$$anonfun$13.apply(HiveQl.scala:500)
 at org.apache.spark.sql.hive.HiveQl$$anonfun$13.apply(HiveQl.scala:500)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:500)
 at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:248)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5680) Sum function on all null values, should return zero

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-5680:

Target Version/s: 1.4.0  (was: 1.3.0)

 Sum function on all null values, should return zero
 ---

 Key: SPARK-5680
 URL: https://issues.apache.org/jira/browse/SPARK-5680
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Venkata Ramana G
Priority: Minor

 SELECT  sum('a'),  avg('a'),  variance('a'),  std('a') FROM src;
 Current output:
 NULL  NULLNULLNULL
 Expected output:
 0.0   NULLNULLNULL
 This fixes hive udaf_number_format.q 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3860) Improve dimension joins

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-3860:

Target Version/s: 1.4.0  (was: 1.3.0)

 Improve dimension joins
 ---

 Key: SPARK-3860
 URL: https://issues.apache.org/jira/browse/SPARK-3860
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Michael Armbrust
Priority: Critical

 This is an umbrella ticket for improving performance for joining multiple 
 dimension tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2087) Clean Multi-user semantics for thrift JDBC/ODBC server.

2015-02-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2087:

Target Version/s: 1.4.0  (was: 1.3.0)

 Clean Multi-user semantics for thrift JDBC/ODBC server.
 ---

 Key: SPARK-2087
 URL: https://issues.apache.org/jira/browse/SPARK-2087
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Priority: Minor

 Configuration and temporary tables should exist per-user.  Cached tables 
 should be shared across users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5436) Validate GradientBoostedTrees during training

2015-02-16 Thread Chris T (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323289#comment-14323289
 ] 

Chris T commented on SPARK-5436:


I thought about this too, but I think there are cases where a user might wish 
to build a model with N trees, and examine the error rate after the fact. If, 
for example, we wer worried about finding global vs local minima, or we wanted 
to asses the rate at which a model started to overfit, or we wanted to do some 
kind testing. 

There are valid reasons why we might want both a specified number of trees, but 
also have the model scoring independently against a testData RDD during build 
phase. It seems both of these cases could easily be supported concurrently.

 Validate GradientBoostedTrees during training
 -

 Key: SPARK-5436
 URL: https://issues.apache.org/jira/browse/SPARK-5436
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 For Gradient Boosting, it would be valuable to compute test error on a 
 separate validation set during training.  That way, training could stop early 
 based on the test error (or some other metric specified by the user).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5436) Validate GradientBoostedTrees during training

2015-02-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323323#comment-14323323
 ] 

Joseph K. Bradley commented on SPARK-5436:
--

Yep, that sounds like what I had in mind:
{code}
  def evaluateEachIteration(data: RDD[LabeledPoint], evaluator or maybe use 
training metric): Array[Double]
{code}
where it essentially calls predict() once but keeps the intermediate results 
after each boosting stage, so that it runs in the same big-O time as predict().

 Validate GradientBoostedTrees during training
 -

 Key: SPARK-5436
 URL: https://issues.apache.org/jira/browse/SPARK-5436
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 For Gradient Boosting, it would be valuable to compute test error on a 
 separate validation set during training.  That way, training could stop early 
 based on the test error (or some other metric specified by the user).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5846) Spark SQL should set job description and pool *before* running jobs

2015-02-16 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-5846:
-

 Summary: Spark SQL should set job description and pool *before* 
running jobs
 Key: SPARK-5846
 URL: https://issues.apache.org/jira/browse/SPARK-5846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.1, 1.3.0
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout


Spark SQL current sets the scheduler pool and job description AFTER jobs run 
(see 
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala#L168
 -- which happens after calling hiveContext.sql).  This should be done before 
the job is run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5005) Failed to start spark-shell when using yarn-client mode with the Spark1.2.0

2015-02-16 Thread anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323683#comment-14323683
 ] 

anuj commented on SPARK-5005:
-

i am having same issue. @yangping wu  what is the resolution for your case?

 Failed to start spark-shell when using  yarn-client mode with the Spark1.2.0
 

 Key: SPARK-5005
 URL: https://issues.apache.org/jira/browse/SPARK-5005
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Shell, YARN
Affects Versions: 1.2.0
 Environment: Spark 1.2.0
 Hadoop 2.2.0
Reporter: yangping wu
Priority: Minor
   Original Estimate: 8h
  Remaining Estimate: 8h

 I am using Spark 1.2.0, but when I starting spark-shell with yarn-client 
 mode({code}MASTER=yarn-client bin/spark-shell{code}), It Failed and the error 
 message is
 {code}
 Unknown/unsupported param List(--executor-memory, 1024m, --executor-cores, 8, 
 --num-executors, 2)
 Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] 
 Options:
   --jar JAR_PATH   Path to your application's JAR file (required)
   --class CLASS_NAME   Name of your application's main class (required)
   --args ARGS  Arguments to be passed to your application's main 
 class.
Mutliple invocations are possible, each will be passed 
 in order.
   --num-executors NUMNumber of executors to start (Default: 2)
   --executor-cores NUM   Number of cores for the executors (Default: 1)
   --executor-memory MEM  Memory per executor (e.g. 1000M, 2G) (Default: 1G)
 {code}
 But when I using Spark 1.1.0,and also using {code}MASTER=yarn-client 
 bin/spark-shell{code} to starting spark-shell,it works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5837) HTTP 500 if try to access Spark UI in yarn-cluster or yarn-client mode

2015-02-16 Thread Marco Capuccini (JIRA)
Marco Capuccini created SPARK-5837:
--

 Summary: HTTP 500 if try to access Spark UI in yarn-cluster or 
yarn-client mode
 Key: SPARK-5837
 URL: https://issues.apache.org/jira/browse/SPARK-5837
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.2.1, 1.2.0
Reporter: Marco Capuccini
Priority: Blocker


Both Spark 1.2.0 and Spark 1.2.1 return this error when I try to access the 
Spark UI if I run over yarn (version 2.4.0):
HTTP ERROR 500

Problem accessing /proxy/application_1423564210894_0017/. Reason:

Connection refused

Caused by:

java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.init(Socket.java:425)
at java.net.Socket.init(Socket.java:280)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at 
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at 
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:187)
at 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:344)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:79)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 

[jira] [Updated] (SPARK-5831) When checkpoint file size is bigger than 10, then delete them

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5831:
-
Priority: Trivial  (was: Minor)
Assignee: meiyoula

 When checkpoint file size is bigger than 10, then delete them
 -

 Key: SPARK-5831
 URL: https://issues.apache.org/jira/browse/SPARK-5831
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: meiyoula
Assignee: meiyoula
Priority: Trivial
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5831) When checkpoint file size is bigger than 10, then delete them

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5831.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 4621
[https://github.com/apache/spark/pull/4621]

 When checkpoint file size is bigger than 10, then delete them
 -

 Key: SPARK-5831
 URL: https://issues.apache.org/jira/browse/SPARK-5831
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: meiyoula
Priority: Minor
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4010) Spark UI returns 500 in yarn-client mode

2015-02-16 Thread Marco Capuccini (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322851#comment-14322851
 ] 

Marco Capuccini commented on SPARK-4010:


Yes, and it seems to be fixed... but I still have the problem in Spark 1.2.1, 
and 1.2.0.

 Spark UI returns 500 in yarn-client mode 
 -

 Key: SPARK-4010
 URL: https://issues.apache.org/jira/browse/SPARK-4010
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.2.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li
Priority: Blocker
 Fix For: 1.1.1, 1.2.0


 http://host/proxy/application_id/stages/   returns this result:
 {noformat}
 HTTP ERROR 500
 Problem accessing /proxy/application_1411648907638_0281/stages/. Reason:
 Connection refused
 Caused by:
 java.net.ConnectException: Connection refused
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
   at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
   at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
   at java.net.Socket.connect(Socket.java:579)
   at java.net.Socket.connect(Socket.java:528)
   at java.net.Socket.init(Socket.java:425)
   at java.net.Socket.init(Socket.java:280)
   at 
 org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
   at 
 org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
   at 
 org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
   at 
 org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:185)
   at 
 org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:336)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1183)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 

[jira] [Issue Comment Deleted] (SPARK-4010) Spark UI returns 500 in yarn-client mode

2015-02-16 Thread Marco Capuccini (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Capuccini updated SPARK-4010:
---
Comment: was deleted

(was: Yes, and it seems to be fixed... but I still have the problem in Spark 
1.2.1, and 1.2.0.)

 Spark UI returns 500 in yarn-client mode 
 -

 Key: SPARK-4010
 URL: https://issues.apache.org/jira/browse/SPARK-4010
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.2.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li
Priority: Blocker
 Fix For: 1.1.1, 1.2.0


 http://host/proxy/application_id/stages/   returns this result:
 {noformat}
 HTTP ERROR 500
 Problem accessing /proxy/application_1411648907638_0281/stages/. Reason:
 Connection refused
 Caused by:
 java.net.ConnectException: Connection refused
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
   at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
   at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
   at java.net.Socket.connect(Socket.java:579)
   at java.net.Socket.connect(Socket.java:528)
   at java.net.Socket.init(Socket.java:425)
   at java.net.Socket.init(Socket.java:280)
   at 
 org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
   at 
 org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
   at 
 org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
   at 
 org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:185)
   at 
 org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:336)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1183)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 

[jira] [Resolved] (SPARK-1697) Driver error org.apache.spark.scheduler.TaskSetManager - Loss was due to java.io.FileNotFoundException

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1697.
--
Resolution: Duplicate

This is either stale, or likely the same issue identified in SPARK-2243

 Driver error org.apache.spark.scheduler.TaskSetManager - Loss was due to 
 java.io.FileNotFoundException
 --

 Key: SPARK-1697
 URL: https://issues.apache.org/jira/browse/SPARK-1697
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Reporter: Arup Malakar

 We are running spark-streaming 0.9.0 on top of Yarn (Hadoop 
 2.2.0-cdh5.0.0-beta-2). It reads from kafka and processes the data. So far we 
 haven't seen any issues, except today we saw an exception in the driver log 
 and it is not consuming kafka messages any more. 
 Here is the exception we saw:
 {code}
 2014-05-01 10:00:43,962 [Result resolver thread-3] WARN  
 org.apache.spark.scheduler.TaskSetManager - Loss was due to 
 java.io.FileNotFoundException
 java.io.FileNotFoundException: http://10.50.40.85:53055/broadcast_2412
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
   at 
 org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
   at 
 org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
   at 
 

[jira] [Updated] (SPARK-5835) Unit test causes java.io.FileNotFoundException on localhost for file broadcast_1

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5835:
-
Component/s: Tests
   Priority: Minor  (was: Major)

You say you're not running in parallel but are you creating multiple 
SparkContexts? then this is the same as 
https://issues.apache.org/jira/browse/SPARK-2243

 Unit test causes java.io.FileNotFoundException on localhost for file 
 broadcast_1
 --

 Key: SPARK-5835
 URL: https://issues.apache.org/jira/browse/SPARK-5835
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.0.0
Reporter: sam
Priority: Minor

 Note, I do not believe this is related to SPARK-2984 since I have speculative 
 execution off (it's off by default in 1.0.0).
 I intermittently get the following stack trace in my unit tests. I'm using 
 specs2 and I have sequential in the tests (so should not be bumping into 
 each other), and also I have `parallelExecution in Test := false` in my 
 `build.sbt`.
 This isn't a major showstopper, it just means our CI pipelines need some 
 retry logic to workaround the erroring tests.
 [error] Could not run test my.test.Class: org.apache.spark.SparkException: 
 Job aborted due to stage failure: Task 4.0:0 failed 1 times, most recent 
 failure: Exception failure in TID 6 on host localhost: 
 java.io.FileNotFoundException: http://blar.blar.blar.blar:59528/broadcast_1
 [error] 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1834)
 [error] 
 sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:90)
 [error] 
 sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1431)
 [error] 
 sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1429)
 [error] java.security.AccessController.doPrivileged(Native Method)
 [error] 
 java.security.AccessController.doPrivileged(AccessController.java:713)
 [error] 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1428)
 [error] 
 org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:196)
 [error] 
 org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:89)
 [error] sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
 [error] 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [error] java.lang.reflect.Method.invoke(Method.java:483)
 [error] 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 [error] 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
 [error] 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 [error] 
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 [error] 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
 [error] 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
 [error] 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 [error] 
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 [error] 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
 [error] 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
 [error] 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 [error] 
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 [error] 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
 [error] 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
 [error] 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 [error] 
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 [error] 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
 [error] 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
 [error] 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 [error] 
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 [error] 
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 [error] 
 scala.collection.immutable.$colon$colon.readObject(List.scala:362)
 [error] sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 [error] 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [error] java.lang.reflect.Method.invoke(Method.java:483)
 [error] 
 

[jira] [Updated] (SPARK-5770) Use addJar() to upload a new jar file to executor, it can't be added to classloader

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5770:
-
Priority: Minor  (was: Major)

 Use addJar() to upload a new jar file to executor, it can't be added to 
 classloader
 ---

 Key: SPARK-5770
 URL: https://issues.apache.org/jira/browse/SPARK-5770
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: meiyoula
Priority: Minor

 First use addJar() to upload a jar to the executor, then change the jar 
 content and upload it again. We can see the jar file in the local has be 
 updated, but the classloader still load the old one. The executor log has no 
 error or exception to point it.
 I use spark-shell to test it. And set spark.files.overwrite is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5770) Use addJar() to upload a new jar file to executor, it can't be added to classloader

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5770.
--
Resolution: Won't Fix

PR was withdrawn

 Use addJar() to upload a new jar file to executor, it can't be added to 
 classloader
 ---

 Key: SPARK-5770
 URL: https://issues.apache.org/jira/browse/SPARK-5770
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: meiyoula
Priority: Minor

 First use addJar() to upload a jar to the executor, then change the jar 
 content and upload it again. We can see the jar file in the local has be 
 updated, but the classloader still load the old one. The executor log has no 
 error or exception to point it.
 I use spark-shell to test it. And set spark.files.overwrite is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-02-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322647#comment-14322647
 ] 

Sean Owen commented on SPARK-1867:
--

With Bjorn Jonsson here, we think we located the cause, at least for those 
people here using CDH 5.2. It seems to occur with the MR1 flavor, when using 
standalone mode (not YARN). The cause seemed to be that the MR1-flavored 
dependencies get on the classpath along with non-MR1 Hadoop dependencies, and

So I'm going to re-resolve as not a problem _with Spark_ but a particular 
packaging. Bjorn mentions it seems to be worked around with temporarily 
changing an env variable for Spark's jobs: {{HADOOP_CONF_DIR to 
/etc/hadoop/conf.cloudera.YARN-1}}  It doesn't seem to happen with 5.3.

This might explain why people got it to work by using a 'stock' distribution; 
in that case they'd not have any MR1 dependencies, I believe.

This also relates to https://issues.apache.org/jira/browse/SPARK-4048 which may 
have been the eventual fix; Marcelo might be able to comment.

(Also of interest, note that https://bugs.openjdk.java.net/browse/JDK-7172206 
may be causing the real underlying exception to be masked, which doesn't help)

I don't know if it's specific to the CDH 5.2 MR1 stuff, and appears resolved 
anyway later. If so, let's re-close as NotAProblem since it's not to do with 
Spark, but I'll pause a beat for that.

 Spark Documentation Error causes java.lang.IllegalStateException: unread 
 block data
 ---

 Key: SPARK-1867
 URL: https://issues.apache.org/jira/browse/SPARK-1867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: sam

 I've employed two System Administrators on a contract basis (for quite a bit 
 of money), and both contractors have independently hit the following 
 exception.  What we are doing is:
 1. Installing Spark 0.9.1 according to the documentation on the website, 
 along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
 cluster
 I've also included code snippets, and sbt deps at the bottom.
 When I've Googled this, there seems to be two somewhat vague responses:
 a) Mismatching spark versions on nodes/user code
 b) Need to add more jars to the SparkConf
 Now I know that (b) is not the problem having successfully run the same code 
 on other clusters while only including one jar (it's a fat jar).
 But I have no idea how to check for (a) - it appears Spark doesn't have any 
 version checks or anything - it would be nice if it checked versions and 
 threw a mismatching version exception: you have user code using version X 
 and node Y has version Z.
 I would be very grateful for advice on this.
 The exception:
 Exception in thread main org.apache.spark.SparkException: Job aborted: Task 
 0.0:1 failed 32 times (most recent failure: Exception failure: 
 java.lang.IllegalStateException: unread block data)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/05/16 18:05:31 INFO 

[jira] [Resolved] (SPARK-5830) Don't create unnecessary directory for local root dir

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5830.
--
Resolution: Duplicate

 Don't create unnecessary directory for local root dir
 -

 Key: SPARK-5830
 URL: https://issues.apache.org/jira/browse/SPARK-5830
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Weizhong
Priority: Minor

 Now will create an unnecessary directory for local root directory, and this 
 directory will not be deleted after application exit.
 For example:
 before will create tmp dir like /tmp/spark-UUID
 now will create tmp dir like /tmp/spark-UUID/spark-UUID
 so the dir /tmp/spark-UUID will not be deleted as a local root directory. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5832) Add Affinity Propagation clustering algorithm

2015-02-16 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-5832:
--

 Summary: Add Affinity Propagation clustering algorithm
 Key: SPARK-5832
 URL: https://issues.apache.org/jira/browse/SPARK-5832
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Liang-Chi Hsieh






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5296) Predicate Pushdown (BaseRelation) to have an interface that will accept more filters

2015-02-16 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-5296:
--
Summary: Predicate Pushdown (BaseRelation) to have an interface that will 
accept more filters  (was: Predicate Pushdown (BaseRelation) to have an 
interface that will accept OR filters)

 Predicate Pushdown (BaseRelation) to have an interface that will accept more 
 filters
 

 Key: SPARK-5296
 URL: https://issues.apache.org/jira/browse/SPARK-5296
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet
Assignee: Cheng Lian
Priority: Critical

 Currently, the BaseRelation API allows a FilteredRelation to handle an 
 Array[Filter] which represents filter expressions that are applied as an AND 
 operator.
 We should support OR operations in a BaseRelation as well. I'm not sure what 
 this would look like in terms of API changes, but it almost seems like a 
 FilteredUnionedScan BaseRelation (the name stinks but you get the idea) would 
 be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5296) Predicate Pushdown (BaseRelation) to have an interface that will accept more filters

2015-02-16 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322550#comment-14322550
 ] 

Cheng Lian commented on SPARK-5296:
---

Nested AND/OR/NOT filters can be processed in a way very similar to the Parquet 
filter push-down code.

 Predicate Pushdown (BaseRelation) to have an interface that will accept more 
 filters
 

 Key: SPARK-5296
 URL: https://issues.apache.org/jira/browse/SPARK-5296
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet
Assignee: Cheng Lian
Priority: Critical

 Currently, the BaseRelation API allows a FilteredRelation to handle an 
 Array[Filter] which represents filter expressions that are applied as an AND 
 operator.
 We should support OR operations in a BaseRelation as well. I'm not sure what 
 this would look like in terms of API changes, but it almost seems like a 
 FilteredUnionedScan BaseRelation (the name stinks but you get the idea) would 
 be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5296) Predicate Pushdown (BaseRelation) to have an interface that will accept more filters

2015-02-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322556#comment-14322556
 ] 

Apache Spark commented on SPARK-5296:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/4623

 Predicate Pushdown (BaseRelation) to have an interface that will accept more 
 filters
 

 Key: SPARK-5296
 URL: https://issues.apache.org/jira/browse/SPARK-5296
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet
Assignee: Cheng Lian
Priority: Critical

 Currently, the BaseRelation API allows a FilteredRelation to handle an 
 Array[Filter] which represents filter expressions that are applied as an AND 
 operator.
 We should support OR operations in a BaseRelation as well. I'm not sure what 
 this would look like in terms of API changes, but it almost seems like a 
 FilteredUnionedScan BaseRelation (the name stinks but you get the idea) would 
 be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module

2015-02-16 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322559#comment-14322559
 ] 

Littlestar commented on SPARK-3638:
---

Oh, It was introduced in kinesis-asl profile only.
I think httpclient 4.1.2 is too old,  standard distribution may conflict with 
other httpclient required user app.
now I build spark with kinesis-asl profile, it's ok with httpclient 4.2.6, 
thanks.

mvn dependency:tree



 Commons HTTP client dependency conflict in extras/kinesis-asl module
 

 Key: SPARK-3638
 URL: https://issues.apache.org/jira/browse/SPARK-3638
 Project: Spark
  Issue Type: Bug
  Components: Examples, Streaming
Affects Versions: 1.1.0
Reporter: Aniket Bhatnagar
  Labels: dependencies
 Fix For: 1.1.1, 1.2.0


 Followed instructions as mentioned @ 
 https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md
  and when running the example, I get the following error:
 {code}
 Caused by: java.lang.NoSuchMethodError: 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
 at 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
 at 
 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
 at 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
 at 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)
 {code}
 I believe this is due to the dependency conflict as described @ 
 http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module

2015-02-16 Thread Aniket Bhatnagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322518#comment-14322518
 ] 

Aniket Bhatnagar commented on SPARK-3638:
-

Did you build spark with kinesis-asl profile? The standard distribution does 
not have this profile and therefore you would have to roll your won as 
described in 
https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md
 (mvn -Pkinesis-asl -DskipTests clean package). 

 Commons HTTP client dependency conflict in extras/kinesis-asl module
 

 Key: SPARK-3638
 URL: https://issues.apache.org/jira/browse/SPARK-3638
 Project: Spark
  Issue Type: Bug
  Components: Examples, Streaming
Affects Versions: 1.1.0
Reporter: Aniket Bhatnagar
  Labels: dependencies
 Fix For: 1.1.1, 1.2.0


 Followed instructions as mentioned @ 
 https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md
  and when running the example, I get the following error:
 {code}
 Caused by: java.lang.NoSuchMethodError: 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
 at 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
 at 
 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
 at 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
 at 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
 at 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)
 {code}
 I believe this is due to the dependency conflict as described @ 
 http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5832) Add Affinity Propagation clustering algorithm

2015-02-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322528#comment-14322528
 ] 

Apache Spark commented on SPARK-5832:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/4622

 Add Affinity Propagation clustering algorithm
 -

 Key: SPARK-5832
 URL: https://issues.apache.org/jira/browse/SPARK-5832
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Liang-Chi Hsieh





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5804) Explicitly manage cache in Crossvalidation k-fold loop

2015-02-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5804:
-
Assignee: Peter Rudenko

 Explicitly manage cache in Crossvalidation k-fold loop
 --

 Key: SPARK-5804
 URL: https://issues.apache.org/jira/browse/SPARK-5804
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.3.0
Reporter: Peter Rudenko
Assignee: Peter Rudenko
Priority: Minor
 Fix For: 1.3.0


 On a big dataset explicitly unpersist train and validation folds allows to 
 load more data into memory in the next loop iteration. On my environment 
 (single node 8Gb worker RAM, 2 GB dataset file, 3 folds for cross 
 validation), saved more than 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5767) Migrate Parquet data source to the write support of data source API

2015-02-16 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-5767.
---
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4563
[https://github.com/apache/spark/pull/4563]

 Migrate Parquet data source to the write support of data source API
 ---

 Key: SPARK-5767
 URL: https://issues.apache.org/jira/browse/SPARK-5767
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Cheng Lian
Assignee: Cheng Lian
 Fix For: 1.3.0


 Migrate to the newly introduced data source write support API (SPARK-5658). 
 Add support for overwriting and appending to existing tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4553) query for parquet table with string fields in spark sql hive get binary result

2015-02-16 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-4553.
---
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4563
[https://github.com/apache/spark/pull/4563]

 query for parquet table with string fields in spark sql hive get binary result
 --

 Key: SPARK-4553
 URL: https://issues.apache.org/jira/browse/SPARK-4553
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Fei Wang
Assignee: Cheng Lian
Priority: Blocker
 Fix For: 1.3.0


 run 
 create table test_parquet(key int, value string) stored as parquet;
 insert into table test_parquet select * from src;
 select * from test_parquet;
 get result as follow
 ...
 282 [B@38fda3b
 138 [B@1407a24
 238 [B@12de6fb
 419 [B@6c97695
 15 [B@4885067
 118 [B@156a8d3
 72 [B@65d20dd
 90 [B@4c18906
 307 [B@60b24cc
 19 [B@59cf51b
 435 [B@39fdf37
 10 [B@4f799d7
 277 [B@3950951
 273 [B@596bf4b
 306 [B@3e91557
 224 [B@3781d61
 309 [B@2d0d128



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old

2015-02-16 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5834:
--
Description: 
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module requires httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.


  was:
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module required httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.



 spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
 --

 Key: SPARK-5834
 URL: https://issues.apache.org/jira/browse/SPARK-5834
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

  I see 
 spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
  It indicates that officical package only use httpclient 4.1.2.
 some spark module requires httpclient 4.2 and above.
 https://github.com/apache/spark/pull/2489/files ( 
 commons.httpclient.version4.2/commons.httpclient.version)
 https://github.com/apache/spark/pull/2535/files 
 (commons.httpclient.version4.2.6/commons.httpclient.version)
 I think httpclient 4.1.2 is too old, standard distribution may conflict with 
 other httpclient required user app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5833) Adds REFRESH TABLE command to refresh external data sources tables

2015-02-16 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-5833:
--
Description: This command can be used to refresh (possibly cached) metadata 
stored in external data source tables. For example, for JSON tables, it forces 
schema inference; for Parquet tables, it forces schema merging and partition 
discovery.

 Adds REFRESH TABLE command to refresh external data sources tables
 --

 Key: SPARK-5833
 URL: https://issues.apache.org/jira/browse/SPARK-5833
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Cheng Lian

 This command can be used to refresh (possibly cached) metadata stored in 
 external data source tables. For example, for JSON tables, it forces schema 
 inference; for Parquet tables, it forces schema merging and partition 
 discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5745) Allow to use custom TaskMetrics implementation

2015-02-16 Thread Jacek Lewandowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322594#comment-14322594
 ] 

Jacek Lewandowski commented on SPARK-5745:
--

Thanks [~pwendell] for your reply.

The primary goal is to associate with the task some additional data which can 
be collected by some driver-side listener afterwards. The data which I'd like 
to collect is not accessible to the user directly - say, I want to collect the 
number of rows fetched from the database, or the number of batches written to 
the database. These values are known inside the job code and can be easily 
reported to task metrics (just like the number of read/written bytes are 
reported now). 

If I understand the idea of accumulators correctly, although the accumulator is 
a great feature for application-specific metrics, I don't really know how to 
use them to collect metrics which are more general - like RDD / job execution 
metrics, which are a part of an intermediate framework or a library. 


 Allow to use custom TaskMetrics implementation
 --

 Key: SPARK-5745
 URL: https://issues.apache.org/jira/browse/SPARK-5745
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Reporter: Jacek Lewandowski

 There can be various RDDs implemented and the {{TaskMetrics}} provides a 
 great API for collecting metrics and aggregating them. However some RDDs may 
 want to register some custom metrics and the current implementation doesn't 
 allow for this (for example the number of read rows or whatever).
 I suppose that this can be changed without modifying the whole interface - 
 there could used some factory to create the initial {{TaskMetrics}} object. 
 The default factory could be overridden by user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-16 Thread Florian Verhein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322611#comment-14322611
 ] 

Florian Verhein commented on SPARK-5813:


I think it's a good idea to stick to vendor recommendations, but since I can't 
point to any concrete benefits and there is complexity around handling 
licensing issues, I don't think there's a good argument for tackling this.

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old

2015-02-16 Thread Littlestar (JIRA)
Littlestar created SPARK-5834:
-

 Summary: spark 1.2.1 officical package bundled with httpclient 
4.1.2 is too old
 Key: SPARK-5834
 URL: https://issues.apache.org/jira/browse/SPARK-5834
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.2.1
Reporter: Littlestar


assembly-1.1.1-hadoop2.4.0.jar the class HttpPatch is not there which was 
introduced in 4.2
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module required httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old

2015-02-16 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated SPARK-5834:
--
Description: 
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module required httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.


  was:
assembly-1.1.1-hadoop2.4.0.jar the class HttpPatch is not there which was 
introduced in 4.2
 I see 
spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
 It indicates that officical package only use httpclient 4.1.2.

some spark module required httpclient 4.2 and above.
https://github.com/apache/spark/pull/2489/files ( 
commons.httpclient.version4.2/commons.httpclient.version)
https://github.com/apache/spark/pull/2535/files 
(commons.httpclient.version4.2.6/commons.httpclient.version)

I think httpclient 4.1.2 is too old, standard distribution may conflict with 
other httpclient required user app.


   Priority: Minor  (was: Major)

 spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
 --

 Key: SPARK-5834
 URL: https://issues.apache.org/jira/browse/SPARK-5834
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

  I see 
 spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
  It indicates that officical package only use httpclient 4.1.2.
 some spark module required httpclient 4.2 and above.
 https://github.com/apache/spark/pull/2489/files ( 
 commons.httpclient.version4.2/commons.httpclient.version)
 https://github.com/apache/spark/pull/2535/files 
 (commons.httpclient.version4.2.6/commons.httpclient.version)
 I think httpclient 4.1.2 is too old, standard distribution may conflict with 
 other httpclient required user app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5833) Adds REFRESH TABLE command to refresh external data sources tables

2015-02-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322595#comment-14322595
 ] 

Apache Spark commented on SPARK-5833:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/4624

 Adds REFRESH TABLE command to refresh external data sources tables
 --

 Key: SPARK-5833
 URL: https://issues.apache.org/jira/browse/SPARK-5833
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Cheng Lian

 This command can be used to refresh (possibly cached) metadata stored in 
 external data source tables. For example, for JSON tables, it forces schema 
 inference; for Parquet tables, it forces schema merging and partition 
 discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-16 Thread Florian Verhein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florian Verhein closed SPARK-5813.
--
Resolution: Won't Fix

 Spark-ec2: Switch to OracleJDK
 --

 Key: SPARK-5813
 URL: https://issues.apache.org/jira/browse/SPARK-5813
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Florian Verhein
Priority: Minor

 Currently using OpenJDK, however it is generally recommended to use Oracle 
 JDK, esp for Hadoop deployments, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-02-16 Thread Beniamino (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322614#comment-14322614
 ] 

Beniamino commented on SPARK-2344:
--

Hi everybody,

I'm currently working on the Fuzzy C Means implementation too.
I Have a first draft of my code here : 
https://github.com/bdelpizzo/mllib-extension/blob/master/clustering/FCM.scala

I'm still working on it. I really will appreciate any suggestions.
Thanks

 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5834.
--
Resolution: Not a Problem

Spark doesn't actually use HttpClient at all; its dependencies do. You're 
looking at a dependency update specific to the Kinesis ASL build, which is not 
enabled in the build you downloaded.

You would not depend on Spark's copy of this lib anyway. You depend on the 
version you need.

 spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
 --

 Key: SPARK-5834
 URL: https://issues.apache.org/jira/browse/SPARK-5834
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.2.1
Reporter: Littlestar
Priority: Minor

  I see 
 spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties
  It indicates that officical package only use httpclient 4.1.2.
 some spark module requires httpclient 4.2 and above.
 https://github.com/apache/spark/pull/2489/files ( 
 commons.httpclient.version4.2/commons.httpclient.version)
 https://github.com/apache/spark/pull/2535/files 
 (commons.httpclient.version4.2.6/commons.httpclient.version)
 I think httpclient 4.1.2 is too old, standard distribution may conflict with 
 other httpclient required user app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-02-16 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322658#comment-14322658
 ] 

Alex commented on SPARK-2344:
-

Hi,

I'm also working on the implementation of FCM,
You can find my work here:
https://github.com/salexln/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/clustering



Alex


 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found

2015-02-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5829.
--
Resolution: Duplicate

Same as SPARK-3228 which is WontFix. The behavior is intended. You can actually 
copy and change the saveAs* functions and change them to get the behavior you 
want pretty easily.

 JavaStreamingContext.fileStream run task loop repeated  empty when no more 
 new files found
 --

 Key: SPARK-5829
 URL: https://issues.apache.org/jira/browse/SPARK-5829
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
 Environment: spark master (1.3.0) with SPARK-5826 patch.
Reporter: Littlestar
Priority: Minor

 spark master (1.3.0) with SPARK-5826 patch.
 JavaStreamingContext.fileStream run task repeated empty when no more new files
 reproduce:
   1. mkdir /testspark/watchdir on HDFS.
   2. run app.
   3. put some text files into /testspark/watchdir.
 every 30 seconds, spark log indicates that a new sub task runs.
 and /testspark/resultdir/ has new directory with empty files every 30 seconds.
 when no new files add, but it runs new task with empy rdd.
 {noformat}
 package my.test.hadoop.spark;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.api.java.function.Function2;
 import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.streaming.Durations;
 import org.apache.spark.streaming.api.java.JavaPairDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import scala.Tuple2;
 public class TestStream {
   @SuppressWarnings({ serial, resource })
   public static void main(String[] args) throws Exception {
   
   SparkConf conf = new SparkConf().setAppName(TestStream);
   JavaStreamingContext jssc = new JavaStreamingContext(conf, 
 Durations.seconds(30));
   jssc.checkpoint(/testspark/checkpointdir);
   Configuration jobConf = new Configuration();
   jobConf.set(my.test.fields,fields);
 JavaPairDStreamInteger, Integer is = 
 jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, 
 TextInputFormat.class, new FunctionPath, Boolean() {
 @Override
 public Boolean call(Path v1) throws Exception {
 return true;
 }
 }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, 
 Text, Integer, Integer() {
 @Override
 public Tuple2Integer, Integer call(Tuple2LongWritable, Text 
 arg0) throws Exception {
 return new Tuple2Integer, Integer(1, 1);
 }
 });
   JavaPairDStreamInteger, Integer rs = is.reduceByKey(new 
 Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer arg0, Integer arg1) throws 
 Exception {
   return arg0 + arg1;
   }
   });
   rs.checkpoint(Durations.seconds(60));
   rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, 
 suffix, Integer.class, Integer.class, TextOutputFormat.class);
   jssc.start();
   jssc.awaitTermination();
   }
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5835) Unit test causes java.io.FileNotFoundException on localhost for file broadcast_1

2015-02-16 Thread sam (JIRA)
sam created SPARK-5835:
--

 Summary: Unit test causes java.io.FileNotFoundException on 
localhost for file broadcast_1
 Key: SPARK-5835
 URL: https://issues.apache.org/jira/browse/SPARK-5835
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: sam


Note, I do not believe this is related to SPARK-2984 since I have speculative 
execution off (it's off by default in 1.0.0).

I intermittently get the following stack trace in my unit tests. I'm using 
specs2 and I have sequential in the tests (so should not be bumping into each 
other), and also I have `parallelExecution in Test := false` in my `build.sbt`.

This isn't a major showstopper, it just means our CI pipelines need some 
retry logic to workaround the erroring tests.

[error] Could not run test my.test.Class: org.apache.spark.SparkException: Job 
aborted due to stage failure: Task 4.0:0 failed 1 times, most recent failure: 
Exception failure in TID 6 on host localhost: java.io.FileNotFoundException: 
http://blar.blar.blar.blar:59528/broadcast_1
[error] 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1834)
[error] 
sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:90)
[error] 
sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1431)
[error] 
sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1429)
[error] java.security.AccessController.doPrivileged(Native Method)
[error] 
java.security.AccessController.doPrivileged(AccessController.java:713)
[error] 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1428)
[error] 
org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:196)
[error] 
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:89)
[error] sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
[error] 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] java.lang.reflect.Method.invoke(Method.java:483)
[error] 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
[error] 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
[error] 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[error] 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[error] 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[error] 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[error] 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
[error] 
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
[error] sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
[error] 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] java.lang.reflect.Method.invoke(Method.java:483)
[error] 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
[error] 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
[error] 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[error] 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[error] 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[error] 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[error] 

<    1   2   3