date:20151214

[jira] [Commented] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056969#comment-15056969
 ] 

Shivaram Venkataraman commented on SPARK-12327:
---

cc [~felixcheung] This is related to SPARK-11263

> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date -> Date
>   ^~~~
> R/deserialize.R:31:3: style: Commented code should be removed.
> # Time -> POSIXct
>   ^~~
>

[jira] [Closed] (SPARK-12275) No plan for BroadcastHint in some condition

2015-12-14 Thread yucai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yucai closed SPARK-12275.
-

I verified the PR has fixed this issue.

> No plan for BroadcastHint in some condition
> ---
>
> Key: SPARK-12275
> URL: https://issues.apache.org/jira/browse/SPARK-12275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: yucai
>Assignee: yucai
>  Labels: backport-needed
> Fix For: 1.5.3, 1.6.1, 2.0.0
>
>
> *Summary*
> No plan for BroadcastHint is generated in some condition.
> *Test Case*
> {code}
> val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value")
> val parquetTempFile =
>   "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), 
> scala.util.Random.nextInt)
> df1.write.parquet(parquetTempFile)
> val pf1 = sqlContext.read.parquet(parquetTempFile)
> #1. df1.join(broadcast(pf1)).count()
> #2. broadcast(pf1).count()
> {code}
> *Result*
> It will trigger assertion in QueryPlanner.scala, like below:
> {code}
> scala> df1.join(broadcast(pf1)).count()
> java.lang.AssertionError: assertion failed: No plan for BroadcastHint
> +- Relation[key#6,value#7] 
> ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet]
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12331) R^2 for regression through the origin

2015-12-14 Thread Imran Younus (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Younus updated SPARK-12331:
-
Description: 
The value of R^2 (coefficient of determination) obtained from 
LinearRegressionModel is not consistent with R and statsmodels when the 
fitIntercept is false i.e., regression through the origin. In this case, both R 
and statsmodels use the definition of R^2 given by eq(4') in the following 
review paper:

https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf

Here is the definition from this paper:
R^2 = \sum(\hat( y)_i^2)/\sum(y_i^2)

The paper also describes why this should be the case. I've double checked that 
the value of R^2 from statsmodels and R are consistent with this definition. On 
the other hand, scikit-learn doesn't use the above definition. I would 
recommend using the above definition in Spark.


  was:
The value of R^2 (coefficient of determination) obtained from 
LinearRegressionModel is not consistent with R and statsmodels when the 
fitIntercept is false i.e., regression through the origin. In this case, both R 
and statsmodels use the definition of R^2 given by eq(4') in the following 
review paper:

https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf

Here is the definition from this paper:
R^2 = \sum(\hat(y)_i^2)/\sum(y_i^2)

The paper also describes why this should be the case. I've double checked that 
the value of R^2 from statsmodels and R are consistent with this definition. On 
the other hand, scikit-learn doesn't use the above definition. I would 
recommend using the above definition in Spark.



> R^2 for regression through the origin
> -
>
> Key: SPARK-12331
> URL: https://issues.apache.org/jira/browse/SPARK-12331
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Imran Younus
>Priority: Minor
>
> The value of R^2 (coefficient of determination) obtained from 
> LinearRegressionModel is not consistent with R and statsmodels when the 
> fitIntercept is false i.e., regression through the origin. In this case, both 
> R and statsmodels use the definition of R^2 given by eq(4') in the following 
> review paper:
> https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf
> Here is the definition from this paper:
> R^2 = \sum(\hat( y)_i^2)/\sum(y_i^2)
> The paper also describes why this should be the case. I've double checked 
> that the value of R^2 from statsmodels and R are consistent with this 
> definition. On the other hand, scikit-learn doesn't use the above definition. 
> I would recommend using the above definition in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2015-12-14 Thread Ashwin Shankar (JIRA)

Ashwin Shankar created SPARK-12329:
--

 Summary: spark-sql prints out SET commands to stdout instead of 
stderr
 Key: SPARK-12329
 URL: https://issues.apache.org/jira/browse/SPARK-12329
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Ashwin Shankar
Priority: Minor


When I run "$spark-sql -f ", I see that few "SET key value" messages 
get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12307) ParquetFormat options should be exposed through the DataFrameReader/Writer options API

2015-12-14 Thread holdenk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057141#comment-15057141
 ] 

holdenk commented on SPARK-12307:
-

I hear you, I agree we probably don't want to wholesale expose all of the 
different parquet options - this ticket is just about the options which we can 
already set globally and are part of the users docs (see - 
http://spark.apache.org/docs/1.5.2/sql-programming-guide.html#configuration ). 
We could add more options later for the underlying Parquet reader/writer but 
for now I think just making it easy to set the Spark specific Parquet options 
should be reasonable.

> ParquetFormat options should be exposed through the DataFrameReader/Writer 
> options API
> --
>
> Key: SPARK-12307
> URL: https://issues.apache.org/jira/browse/SPARK-12307
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: holdenk
>Priority: Trivial
>
> Currently many options for loading/saving Parquet need to be set globally on 
> the SparkContext. It would be useful to also provide support for setting 
> these options through the DataFrameReader/DataFrameWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12329:


Assignee: (was: Apache Spark)

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12331) R^2 for regression through the origin

2015-12-14 Thread Imran Younus (JIRA)

Imran Younus created SPARK-12331:


 Summary: R^2 for regression through the origin
 Key: SPARK-12331
 URL: https://issues.apache.org/jira/browse/SPARK-12331
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Imran Younus
Priority: Minor


The value of R^2 (coefficient of determination) obtained from 
LinearRegressionModel is not consistent with R and statsmodels when the 
fitIntercept is false i.e., regression through the origin. In this case, both R 
and statsmodels use the definition of R^2 given by eq(4') in the following 
review paper:

https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf

Here is the definition from this paper:
R^2 = \sum(\hat(y)_i^2)/\sum(y_i^2)

The paper also describes why this should be the case. I've double checked that 
the value of R^2 from statsmodels and R are consistent with this definition. On 
the other hand, scikit-learn doesn't use the above definition. I would 
recommend using the above definition in Spark.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056995#comment-15056995
 ] 

Felix Cheung edited comment on SPARK-12327 at 12/14/15 11:51 PM:
-

"# void -> NULL" is another example - they are not code

We could tag them with # nolint



was (Author: felixcheung):
"# void -> NULL" is another example - they are not code


> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Apache Spark
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
>

[jira] [Assigned] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12327:


Assignee: Apache Spark

> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Apache Spark
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date -> Date
>   ^~~~
> R/deserialize.R:31:3: style: Commented code should be removed.
> # Time -> POSIXct
>   ^~~
> R/deserialize.R:33:3: style: Commented code

[jira] [Commented] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056995#comment-15056995
 ] 

Felix Cheung commented on SPARK-12327:
--

"# void -> NULL" is another example - they are not code


> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date -> Date
>   ^~~~
> R/deserialize.R:31:3: style: Commented code should be removed.
> # Time -> POSIXct
>   ^~~
>

[jira] [Commented] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057033#comment-15057033
 ] 

Shivaram Venkataraman commented on SPARK-12327:
---

Yeah [~felixcheung] I think we could tag these blocks of code with #nolint. 
It'll also be good to open a issue with the lint-r project to see if they can 
address these false positives.

> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date

[jira] [Created] (SPARK-12328) Add connectionEstablished callback to RpcHandler to monitor the new connections

2015-12-14 Thread Shixiong Zhu (JIRA)

Shixiong Zhu created SPARK-12328:


 Summary: Add connectionEstablished callback to RpcHandler to 
monitor the new connections
 Key: SPARK-12328
 URL: https://issues.apache.org/jira/browse/SPARK-12328
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.6.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu


`NettyRpcHandler` uses a `ConcurrentHashMap` clients to remember if has 
received any messages from a client to avoid firing multiple 
`RemoteProcessConnected`.

We can add a callback in the RpcHandler to remove the ConcurrentHashMap from 
NettyRpcHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12328) Add connectionEstablished callback to RpcHandler to monitor the new connections

2015-12-14 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057080#comment-15057080
 ] 

Shixiong Zhu commented on SPARK-12328:
--

Oh. Right. Forgot it.

> Add connectionEstablished callback to RpcHandler to monitor the new 
> connections
> ---
>
> Key: SPARK-12328
> URL: https://issues.apache.org/jira/browse/SPARK-12328
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> `NettyRpcHandler` uses a `ConcurrentHashMap` clients to remember if has 
> received any messages from a client to avoid firing multiple 
> `RemoteProcessConnected`.
> We can add a callback in the RpcHandler to remove the ConcurrentHashMap from 
> NettyRpcHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12219) Spark 1.5.2 code does not build on Scala 2.11.7 with SBT assembly

2015-12-14 Thread Rodrigo Boavida (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057133#comment-15057133
 ] 

Rodrigo Boavida commented on SPARK-12219:
-

[~srowen] Just built and ran successfully off the cluster the 1.6 branch. Going 
to run on cluster tomorrow just to double check runtime is in good conditions 
as well, and will let you know. 

Tnks,
Rod

> Spark 1.5.2 code does not build on Scala 2.11.7 with SBT assembly
> -
>
> Key: SPARK-12219
> URL: https://issues.apache.org/jira/browse/SPARK-12219
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.2
>Reporter: Rodrigo Boavida
>
> I've tried with no success to build Spark on Scala 2.11.7. I'm getting build 
> errors using sbt due to the issues found in the below thread in July of this 
> year.
> https://mail-archives.apache.org/mod_mbox/spark-dev/201507.mbox/%3CCA+3qhFSJGmZToGmBU1=ivy7kr6eb7k8t6dpz+ibkstihryw...@mail.gmail.com%3E
> Seems some minor fixes are needed to make the Scala 2.11 compiler happy.
> I needed to build with SBT as per suggested on below thread to get over some 
> apparent maven shader plugin because which changed some classes when I change 
> to akka 2.4.0.
> https://groups.google.com/forum/#!topic/akka-user/iai6whR6-xU
> I've set this bug to Major priority assuming that the Spark community wants 
> to keep fully supporting SBT builds, including the Scala 2.11 compatibility.
> Tnks,
> Rod



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11460) Locality waits should be based on task set creation time, not last launch time

2015-12-14 Thread Kay Ousterhout (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057203#comment-15057203
 ] 

Kay Ousterhout commented on SPARK-11460:


For the specific issue mentioned in the description, can you set 
spark.locality.wait.rack to 0 (is that what you're already doing)?  Does that 
cause other issues?

I commented on the more general issue in the pull request.

> Locality waits should be based on task set creation time, not last launch time
> --
>
> Key: SPARK-11460
> URL: https://issues.apache.org/jira/browse/SPARK-11460
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.2.2, 
> 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0, 1.5.1
> Environment: YARN
>Reporter: Shengyue Ji
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Spark waits for spark.locality.waits period before going from RACK_LOCAL to 
> ANY when selecting an executor for assignment. The timeout was essentially 
> reset each time a new assignment is made.
> We were running Spark streaming on Kafka with a 10 second batch window on 32 
> Kafka partitions with 16 executors. All executors were in the ANY group. At 
> one point one RACK_LOCAL executor was added and all tasks were assigned to 
> it. Each task took about 0.6 second to process, resetting the 
> spark.locality.wait timeout (3000ms) repeatedly. This caused the whole 
> process to under utilize resources and created an increasing backlog.
> spark.locality.wait should be based on the task set creation time, not last 
> launch time so that after 3000ms of initial creation, all executors can get 
> tasks assigned to them.
> We are specifying a zero timeout for now as a workaround to disable locality 
> optimization. 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L556



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12329:


Assignee: Apache Spark

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Assignee: Apache Spark
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12329:


Assignee: (was: Apache Spark)

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12231) Failed to generate predicate Error when using dropna

2015-12-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057049#comment-15057049
 ] 

Apache Spark commented on SPARK-12231:
--

User 'kevinyu98' has created a pull request for this issue:
https://github.com/apache/spark/pull/10299

> Failed to generate predicate Error when using dropna
> 
>
> Key: SPARK-12231
> URL: https://issues.apache.org/jira/browse/SPARK-12231
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.5.2, 1.6.0
> Environment: python version: 2.7.9
> os: ubuntu 14.04
>Reporter: yahsuan, chang
>
> code to reproduce error
> # write.py
> {code}
> import pyspark
> sc = pyspark.SparkContext()
> sqlc = pyspark.SQLContext(sc)
> df = sqlc.range(10)
> df1 = df.withColumn('a', df['id'] * 2)
> df1.write.partitionBy('id').parquet('./data')
> {code}
> # read.py
> {code}
> import pyspark
> sc = pyspark.SparkContext()
> sqlc = pyspark.SQLContext(sc)
> df2 = sqlc.read.parquet('./data')
> df2.dropna().count()
> {code}
> $ spark-submit write.py
> $ spark-submit read.py
> # error message
> {code}
> 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to 
> interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Binding attribute, tree: a#0L
> ...
> {code}
> If write data without partitionBy, the error won't happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12307) ParquetFormat options should be exposed through the DataFrameReader/Writer options API

2015-12-14 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057067#comment-15057067
 ] 

Hyukjin Kwon edited comment on SPARK-12307 at 12/15/15 12:40 AM:
-

Just my personal thought. I felt in the same way with this issue before but I 
ended up with not creating this issue because options would be too many. I 
mean, there are many options omitted, for example, the options for ORC, Parquet 
writer version, Parquet metadata and etc.

I agree that it is a good idea to add some options like this but we might need 
to choose some of important options to expose by a certain condition to decide 
to add them.


was (Author: hyukjin.kwon):
Just my personal though. I felt in the same way with this issue before but I 
ended up with not creating this issue because options would be too many. I 
mean, there are many options omitted, for example, the options for ORC, Parquet 
writer version, Parquet metadata and etc.

I agree that it is a good idea to add some options like this but we might need 
to choose some of important options to expose by a certain condition to decide 
to add them.

> ParquetFormat options should be exposed through the DataFrameReader/Writer 
> options API
> --
>
> Key: SPARK-12307
> URL: https://issues.apache.org/jira/browse/SPARK-12307
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: holdenk
>Priority: Trivial
>
> Currently many options for loading/saving Parquet need to be set globally on 
> the SparkContext. It would be useful to also provide support for setting 
> these options through the DataFrameReader/DataFrameWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12307) ParquetFormat options should be exposed through the DataFrameReader/Writer options API

2015-12-14 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057067#comment-15057067
 ] 

Hyukjin Kwon commented on SPARK-12307:
--

Just my personal though. I felt in the same way with this issue before but I 
ended up with not creating this issue because options would be too many. I 
mean, there are many options omitted, for example, the options for ORC, Parquet 
writer version, Parquet metadata and etc.

I agree that it is a good idea to add some options like this but we might need 
to choose some of important options to expose by a certain condition to decide 
to add them.

> ParquetFormat options should be exposed through the DataFrameReader/Writer 
> options API
> --
>
> Key: SPARK-12307
> URL: https://issues.apache.org/jira/browse/SPARK-12307
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: holdenk
>Priority: Trivial
>
> Currently many options for loading/saving Parquet need to be set globally on 
> the SparkContext. It would be useful to also provide support for setting 
> these options through the DataFrameReader/DataFrameWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12330) Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk

2015-12-14 Thread Charles Allen (JIRA)

Charles Allen created SPARK-12330:
-

 Summary: Mesos coarse executor does not cleanup blockmgr properly 
on termination if data is stored on disk
 Key: SPARK-12330
 URL: https://issues.apache.org/jira/browse/SPARK-12330
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Mesos
Affects Versions: 1.5.1
Reporter: Charles Allen


A simple line count example can be launched as similar to 

{code}
SPARK_HOME=/mnt/tmp/spark 
MASTER=mesos://zk://zk.metamx-prod.com:2181/mesos-druid/metrics 
./bin/spark-shell --conf spark.mesos.coarse=true --conf spark.cores.max=7 
--conf spark.mesos.executor.memoryOverhead=2048 --conf 
spark.mesos.executor.home=/mnt/tmp/spark --conf 
spark.executor.extraJavaOptions='-Duser.timezone=UTC -Dfile.encoding=UTF-8 
-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution 
-XX:+PrintFlagsFinal -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MaxDirectMemorySize=1024m 
-verbose:gc -XX:+PrintFlagsFinal -Djava.io.tmpdir=/mnt/tmp/scratch' --conf 
spark.hadoop.fs.s3n.awsAccessKeyId='REDACTED' --conf 
spark.hadoop.fs.s3n.awsSecretAccessKey='REDACTED' --conf 
spark.executor.memory=7g --conf spark.executorEnv.GLOG_v=9 --conf 
spark.storage.memoryFraction=0.0 --conf spark.shuffle.memoryFraction=0.0
{code}

In the shell the following lines can be executed:

{code}
val text_file = 
sc.textFile("s3n://REDACTED/charlesallen/tpch/lineitem.tbl").persist(org.apache.spark.storage.StorageLevel.DISK_ONLY)
{code}
{code}
text_file.map(l => 1).sum
{code}
which will result in
{code}
res0: Double = 6001215.0
{code}
for the TPCH 1GB dataset

Unfortunately the blockmgr directory remains on the executor node after 
termination of the spark context.

The log on the executor looks like this near the termination:

{code}
I1215 02:12:31.190878 130732 process.cpp:566] Parsed message name 
'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.30:58604 
from slave(1)@172.19.67.30:5051
I1215 02:12:31.190928 130732 process.cpp:2382] Spawned process 
__http__(4)@172.19.67.30:58604
I1215 02:12:31.190932 130721 process.cpp:2392] Resuming 
executor(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.190924800+00:00
I1215 02:12:31.190958 130702 process.cpp:2392] Resuming 
__http__(4)@172.19.67.30:58604 at 2015-12-15 02:12:31.190951936+00:00
I1215 02:12:31.190976 130721 exec.cpp:381] Executor asked to shutdown
I1215 02:12:31.190943 130727 process.cpp:2392] Resuming 
__gc__@172.19.67.30:58604 at 2015-12-15 02:12:31.190937088+00:00
I1215 02:12:31.190991 130702 process.cpp:2497] Cleaning up 
__http__(4)@172.19.67.30:58604
I1215 02:12:31.191032 130721 process.cpp:2382] Spawned process 
(2)@172.19.67.30:58604
I1215 02:12:31.191040 130702 process.cpp:2392] Resuming (2)@172.19.67.30:58604 
at 2015-12-15 02:12:31.191037952+00:00
I1215 02:12:31.191054 130702 exec.cpp:80] Scheduling shutdown of the executor
I1215 02:12:31.191069 130721 exec.cpp:396] Executor::shutdown took 21572ns
I1215 02:12:31.191073 130702 clock.cpp:260] Created a timer for 
(2)@172.19.67.30:58604 in 5secs in the future (2015-12-15 
02:12:36.191062016+00:00)
I1215 02:12:31.191066 130720 process.cpp:2392] Resuming (1)@172.19.67.30:58604 
at 2015-12-15 02:12:31.191059200+00:00
15/12/15 02:12:31 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
I1215 02:12:31.240103 130732 clock.cpp:151] Handling timers up to 2015-12-15 
02:12:31.240091136+00:00
I1215 02:12:31.240123 130732 clock.cpp:158] Have timeout(s) at 2015-12-15 
02:12:31.240036096+00:00
I1215 02:12:31.240183 130730 process.cpp:2392] Resuming 
reaper(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.240178176+00:00
I1215 02:12:31.240226 130730 clock.cpp:260] Created a timer for 
reaper(1)@172.19.67.30:58604 in 100ms in the future (2015-12-15 
02:12:31.340212992+00:00)
I1215 02:12:31.247019 130720 clock.cpp:260] Created a timer for 
(1)@172.19.67.30:58604 in 3secs in the future (2015-12-15 
02:12:34.247005952+00:00)
15/12/15 02:12:31 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: 
SIGTERM
15/12/15 02:12:31 INFO ShutdownHookManager: Shutdown hook called

no more java logs
{code}

If the shuffle fraction is NOT set to 0.0, and the data is allowed to stay in 
memory, then the following log can be seen at termination instead:
{code}
I1215 01:19:16.247705 120052 process.cpp:566] Parsed message name 
'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.24:60016 
from slave(1)@172.19.67.24:5051
I1215 01:19:16.247745 120052 process.cpp:2382] Spawned process 
__http__(4)@172.19.67.24:60016
I1215 01:19:16.247747 120034 process.cpp:2392] Resuming 
executor(1)@172.19.67.24:60016 at 2015-12-15 01:19:16.247741952+00:00
I1215 01:19:16.247758 120030 process.cpp:2392] Resuming 
__gc__@172.19.67.24:60016 at 2015-12-15

[jira] [Commented] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056994#comment-15056994
 ] 

Felix Cheung commented on SPARK-12327:
--

yea, I think these are overactive checks in lint-r that I didn't take out 
earlier.

"list(`1` = 2, `3` = 4)" looks legitimate as an example, not commented code.

> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date -> Date
>   ^~~~
>

[jira] [Commented] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057030#comment-15057030
 ] 

Shivaram Venkataraman commented on SPARK-12327:
---

This test is currently disabled by https://github.com/apache/spark/pull/10300 
-- I'm leaving the JIRA open to address the problem better.

> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date -> Date
>   ^~~~
> R/deserialize.R:31:3:

[jira] [Assigned] (SPARK-11097) Add connection established callback to lower level RPC layer so we don't need to check for new connections in NettyRpcHandler.receive

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11097:


Assignee: (was: Apache Spark)

> Add connection established callback to lower level RPC layer so we don't need 
> to check for new connections in NettyRpcHandler.receive
> -
>
> Key: SPARK-11097
> URL: https://issues.apache.org/jira/browse/SPARK-11097
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>
> I think we can remove the check for new connections in 
> NettyRpcHandler.receive if we just add a channel registered callback to the 
> lower level network module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-11097) Add connection established callback to lower level RPC layer so we don't need to check for new connections in NettyRpcHandler.receive

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11097:


Assignee: Apache Spark

> Add connection established callback to lower level RPC layer so we don't need 
> to check for new connections in NettyRpcHandler.receive
> -
>
> Key: SPARK-11097
> URL: https://issues.apache.org/jira/browse/SPARK-11097
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> I think we can remove the check for new connections in 
> NettyRpcHandler.receive if we just add a channel registered callback to the 
> lower level network module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12274) WrapOption should not have type constraint for child

2015-12-14 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-12274.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10263
[https://github.com/apache/spark/pull/10263]

> WrapOption should not have type constraint for child
> 
>
> Key: SPARK-12274
> URL: https://issues.apache.org/jira/browse/SPARK-12274
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12328) Add connectionEstablished callback to RpcHandler to monitor the new connections

2015-12-14 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-12328.
--
Resolution: Duplicate

> Add connectionEstablished callback to RpcHandler to monitor the new 
> connections
> ---
>
> Key: SPARK-12328
> URL: https://issues.apache.org/jira/browse/SPARK-12328
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> `NettyRpcHandler` uses a `ConcurrentHashMap` clients to remember if has 
> received any messages from a client to avoid firing multiple 
> `RemoteProcessConnected`.
> We can add a callback in the RpcHandler to remove the ConcurrentHashMap from 
> NettyRpcHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-11097) Add connection established callback to lower level RPC layer so we don't need to check for new connections in NettyRpcHandler.receive

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11097:


Assignee: Apache Spark

> Add connection established callback to lower level RPC layer so we don't need 
> to check for new connections in NettyRpcHandler.receive
> -
>
> Key: SPARK-11097
> URL: https://issues.apache.org/jira/browse/SPARK-11097
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> I think we can remove the check for new connections in 
> NettyRpcHandler.receive if we just add a channel registered callback to the 
> lower level network module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12329:


Assignee: Apache Spark

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Assignee: Apache Spark
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12329:


Assignee: (was: Apache Spark)

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12329:


Assignee: Apache Spark

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Assignee: Apache Spark
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman reassigned SPARK-12327:
-

Assignee: (was: Apache Spark)

> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date -> Date
>   ^~~~
> R/deserialize.R:31:3: style: Commented code should be removed.
> # Time -> POSIXct
>   ^~~
> R/deserialize.R:33:3: style: Commented code should

[jira] [Commented] (SPARK-12328) Add connectionEstablished callback to RpcHandler to monitor the new connections

2015-12-14 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057072#comment-15057072
 ] 

Marcelo Vanzin commented on SPARK-12328:


SPARK-11097?

> Add connectionEstablished callback to RpcHandler to monitor the new 
> connections
> ---
>
> Key: SPARK-12328
> URL: https://issues.apache.org/jira/browse/SPARK-12328
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> `NettyRpcHandler` uses a `ConcurrentHashMap` clients to remember if has 
> received any messages from a client to avoid firing multiple 
> `RemoteProcessConnected`.
> We can add a callback in the RpcHandler to remove the ConcurrentHashMap from 
> NettyRpcHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12329) spark-sql prints out SET commands to stdout instead of stderr

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12329:


Assignee: (was: Apache Spark)

> spark-sql prints out SET commands to stdout instead of stderr
> -
>
> Key: SPARK-12329
> URL: https://issues.apache.org/jira/browse/SPARK-12329
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Ashwin Shankar
>Priority: Minor
>
> When I run "$spark-sql -f ", I see that few "SET key value" messages 
> get printed on stdout instead of stderr. These messages should go to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10312) Enhance SerDe to handle atomic vector

2015-12-14 Thread Sun Rui (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055681#comment-15055681
 ] 

Sun Rui commented on SPARK-10312:
-

The gap between R and Scala/Java is that R has no scalar types.
if we want to support this, pesudo code in SerDe would like:
{code}
  if (object is an atomic vector) {
if (length(object) == 1) {
  write it as a scalar value
} else {
  # length(object) == 0 or length(object) > 1
  if (there is any NA in the vector) {
promote it to be a list, and write the list
  } else {
write it as an array
  }
}
  }
{code}

The problem of support this feature is that it may confuse users. Take 
read.parquet for example:
{code}
read.parquet(sqlContext, c("path1", "path2")) will work,
while read.parquet(sqlContext, c("path1")) won't work,  // because method 
signature does not match on JVM side
but read.parquet(sqlContext, as.list(c("path1"))) will work
{code}

So maybe the current  behavior is better, that is:
for a vector, SerDe always write it as a scalar value. In order to fully write 
a vector, as.list() is required.

> Enhance SerDe to handle atomic vector
> -
>
> Key: SPARK-10312
> URL: https://issues.apache.org/jira/browse/SPARK-10312
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 1.4.1
>Reporter: Sun Rui
>
> Currently, writeObject() does not handle atomic vector well. For an atomic 
> vector, it treats it like a scalar object. For example, if you pass c(1:10) 
> into writeObject, it will write a single integer as 1. You have to explicitly 
> cast an atomic vector, for example, as.list(1:10), to a list, if you want to 
> write the whole vector.
> Could we enhance the SerDe that when the object is an atomic vector whose 
> length >1, convert it to a list and then write?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop

2015-12-14 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055633#comment-15055633
 ] 

Michael Han edited comment on SPARK-2356 at 12/14/15 10:03 AM:
---

Hello Everyone,

I encounter this issue today again when I tried to create a cluster using two 
windows 7 (64) desktop.
This errors happens when I register the second worker to the master using the 
following command:
spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077

Strange it works fine when I register the first worker to the master.
anyone knows some work around to fix this issue?
The above work around works fine when I using local mode.
Since I registered one worker successfully in the cluster, but when run 
spark-submit in the successfully worker, it also throw this exception.
I google the entire internet and never seen any body has the experience to 
deploy a windows spark cluster successfully without hadoop, I have a demo in 
later days so hope anyone can help me on this ;) thank you. Otherwise I have to 
run vmwares

I tried to set the HADOOP_HOME = C:\winutil in the env variables, but it 
doesn't work.
The error is:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/14 16:49:22 WARN NativeCodeLoader: Unable to load native-hadoop library fo
r your platform... using builtin-java classes where applicable
15/12/14 16:49:22 ERROR Shell: Failed to locate the winutils binary in the hadoo
p binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Ha
doop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.(Shell.java:363)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)

at org.apache.hadoop.security.Groups.(Groups.java:86)
at org.apache.hadoop.security.Groups.(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Group
s.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupI
nformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(Use
rGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(
UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGrou
pInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGr
oupInformation.java:621)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
.scala:2091)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
.scala:2091)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2091)
at org.apache.spark.SecurityManager.(SecurityManager.scala:212)
at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.
scala:692)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:674)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
15/12/14 16:49:22 INFO SecurityManager: Changing view acls to: mh6
15/12/14 16:49:22 INFO SecurityManager: Changing modify acls to: mh6
15/12/14 16:49:22 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users with view permissions: Set(mh6); users with modify per
missions: Set(mh6)
15/12/14 16:49:23 INFO Slf4jLogger: Slf4jLogger started
15/12/14 16:49:23 INFO Remoting: Starting remoting
15/12/14 16:49:24 INFO Remoting: Remoting started; listening on addresses :[akka
.tcp://sparkWorker@167.3.129.160:46862]
15/12/14 16:49:24 INFO Utils: Successfully started service 'sparkWorker' on port
 46862.
15/12/14 16:49:24 INFO Worker: Starting Spark worker 167.3.129.160:46862 with 4
cores, 2.9 GB RAM
15/12/14 16:49:24 INFO Worker: Running Spark version 1.5.2
15/12/14 16:49:24 INFO Worker: Spark home: C:\spark-1.5.2-bin-hadoop2.6\bin\..
15/12/14 16:49:24 INFO Utils: Successfully started service 'WorkerUI' on port 80
81.
15/12/14 16:49:24 INFO WorkerWebUI: Started WorkerWebUI at http://167.3.129.160:
8081
15/12/14 16:49:24 INFO Worker: Connecting to master 192.168.79.1:7077...
15/12/14 16:49:39 INFO Worker: Retrying connection to master (attempt # 1)
15/12/14 16:49:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thr
ead Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.Futur
eTask@3ef5e68c rejected from java.util.concurrent.ThreadPoolExecutor@741cb720[Ru
nning, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]

at

[jira] [Resolved] (SPARK-12176) SparkLauncher's setConf() does not support configs containing spaces

2015-12-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12176.
---
Resolution: Not A Problem

> SparkLauncher's setConf() does not support configs containing spaces
> 
>
> Key: SPARK-12176
> URL: https://issues.apache.org/jira/browse/SPARK-12176
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2
> Environment: All
>Reporter: Yuhang Chen
>Priority: Minor
>
> The spark-submit uses '--conf K=V' pattern for setting configs. According to 
> the docs, if the 'V' you set has spaces in it, the whole 'K=V' parts should 
> be wrapped with quotes. 
> However, the SparkLauncher (org.apache.spark.launcher.SparkLauncher) would 
> not do that wrapping for you, and there is no chance for wrapping by yourself 
> with the API it provides.
> For example, I want to add {{-XX:+PrintGCDetails -XX:+PrintGCTimeStamps}} for 
> executors (spark.executor.extraJavaOptions), and the conf contains a space in 
> it. 
> For spark-submit, I should wrap the conf with quotes like this:
> {code}
> --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps"
> {code}
> But when I use the setConf() API of SparkLauncher, I write code like this:
> {code}
> launcher.setConf("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps");
> {code} 
> Now, SparkLauncher uses Java's ProcessBuilder to start a sub-process, in 
> which the spark-submit is finally executed. And it turns out that the final 
> command is like this;
> {code} 
> --conf spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps
> {code} 
> See? the quotes are gone, and the job counld not be launched with this 
> command. 
> Then I checked up the source, all confs are stored in a Map before generating 
> launching commands. Thus. my advice is checking all values of the conf Map 
> and do wrapping during command building.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12275) No plan for BroadcastHint in some condition

2015-12-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055760#comment-15055760
 ] 

Apache Spark commented on SPARK-12275:
--

User 'yucai' has created a pull request for this issue:
https://github.com/apache/spark/pull/10291

> No plan for BroadcastHint in some condition
> ---
>
> Key: SPARK-12275
> URL: https://issues.apache.org/jira/browse/SPARK-12275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: yucai
>Assignee: yucai
> Fix For: 1.6.1, 2.0.0
>
>
> *Summary*
> No plan for BroadcastHint is generated in some condition.
> *Test Case*
> {code}
> val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value")
> val parquetTempFile =
>   "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), 
> scala.util.Random.nextInt)
> df1.write.parquet(parquetTempFile)
> val pf1 = sqlContext.read.parquet(parquetTempFile)
> #1. df1.join(broadcast(pf1)).count()
> #2. broadcast(pf1).count()
> {code}
> *Result*
> It will trigger assertion in QueryPlanner.scala, like below:
> {code}
> scala> df1.join(broadcast(pf1)).count()
> java.lang.AssertionError: assertion failed: No plan for BroadcastHint
> +- Relation[key#6,value#7] 
> ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet]
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12317) Support configurate value with unit(e.g. kb/mb/gb) in SQL

2015-12-14 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055887#comment-15055887
 ] 

Sean Owen commented on SPARK-12317:
---

That makes some sense, and is easy to support since our utility methods for 
parsing strings like "10m" will continue to treat an un-suffixed number as 
bytes.

> Support configurate value with unit(e.g. kb/mb/gb) in SQL
> -
>
> Key: SPARK-12317
> URL: https://issues.apache.org/jira/browse/SPARK-12317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Yadong Qi
>Priority: Minor
>
> e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` 
> instead of `10485760`, because `10MB` is more easier than `10485760`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055908#comment-15055908
 ] 

Apache Spark commented on SPARK-12177:
--

User 'nikit-os' has created a pull request for this issue:
https://github.com/apache/spark/pull/10294

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1529) Support DFS based shuffle in addition to Netty shuffle

2015-12-14 Thread Da Fox (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055916#comment-15055916
 ] 

Da Fox commented on SPARK-1529:
---

Hi,
are there any updates for this improvement?

We are running Spark on YARN with a MapR distribution hadoop cluster with small 
local disks. Small disk setup is not uncommon. Requiring local disk space just 
for Spark scratch space seems like a waste of disk space which can be part of 
hdfs. What should be the size of local disk anyway? Should it be dependent on 
the size of datasets we are processing? Doesn't a multi-user environment make 
the problem even worse?

Another issue is with configuration of {{spark.local.dir}} for Spark on YARN: 
{quote}In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS 
(Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the 
cluster manager. {quote}
Spark uses now the same local directory as Node Manager. If we use the 
workaround with NFS mount, it requires us to move Node Manager directory to NFS 
too, which seems obscure (should we create a separate ticket for that?).

We would really appreciate if you implement a solution for this ticket.

Thanks.

> Support DFS based shuffle in addition to Netty shuffle
> --
>
> Key: SPARK-1529
> URL: https://issues.apache.org/jira/browse/SPARK-1529
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Kannan Rajah
> Attachments: Spark Shuffle using HDFS.pdf
>
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. Shuffle is implemented by writing intermediate 
> data to local disk and serving it to remote node using Netty as a transport 
> mechanism. We want to provide an HDFS based shuffle such that data can be 
> written to HDFS (instead of local disk) and served using HDFS API on the 
> remote nodes. This could involve exposing a file system abstraction to Spark 
> shuffle and have 2 modes of running it. In default mode, it will write to 
> local disk and in the DFS mode, it will write to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-1529) Support DFS based shuffle in addition to Netty shuffle

2015-12-14 Thread Da Fox (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055916#comment-15055916
 ] 

Da Fox edited comment on SPARK-1529 at 12/14/15 12:34 PM:
--

Hi,
are there any updates for this improvement?

We are running Spark on YARN with a MapR distribution hadoop cluster with small 
local disks. Small disk setup is not uncommon. Requiring local disk space just 
for Spark scratch space seems like a waste of disk space which can be part of 
hdfs. What should be the size of local disk anyway? Should it be dependent on 
the size of datasets we are processing? Doesn't a multi-user environment make 
the problem even worse?

Another issue is with configuration of {{spark.local.dir}} for Spark on YARN: 
{quote}In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS 
(Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the 
cluster manager. {quote}
Spark uses now the same local directory as Node Manager. If we use the 
workaround with NFS mount, it requires us to move Node Manager directory to NFS 
too, which seems obscure (should we create a separate ticket for that?).

We would really appreciate if you implement a solution for this ticket. We can 
also see  a lot of questions on forums such as 
[StackOverflow|http://stackoverflow.com/q/31303568/878613] related to Spark 
running out of space for scratch dir, so the community would surely appreciate 
it too.

Thanks.


was (Author: dafox777):
Hi,
are there any updates for this improvement?

We are running Spark on YARN with a MapR distribution hadoop cluster with small 
local disks. Small disk setup is not uncommon. Requiring local disk space just 
for Spark scratch space seems like a waste of disk space which can be part of 
hdfs. What should be the size of local disk anyway? Should it be dependent on 
the size of datasets we are processing? Doesn't a multi-user environment make 
the problem even worse?

Another issue is with configuration of {{spark.local.dir}} for Spark on YARN: 
{quote}In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS 
(Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the 
cluster manager. {quote}
Spark uses now the same local directory as Node Manager. If we use the 
workaround with NFS mount, it requires us to move Node Manager directory to NFS 
too, which seems obscure (should we create a separate ticket for that?).

We would really appreciate if you implement a solution for this ticket.

Thanks.

> Support DFS based shuffle in addition to Netty shuffle
> --
>
> Key: SPARK-1529
> URL: https://issues.apache.org/jira/browse/SPARK-1529
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Kannan Rajah
> Attachments: Spark Shuffle using HDFS.pdf
>
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. Shuffle is implemented by writing intermediate 
> data to local disk and serving it to remote node using Netty as a transport 
> mechanism. We want to provide an HDFS based shuffle such that data can be 
> written to HDFS (instead of local disk) and served using HDFS API on the 
> remote nodes. This could involve exposing a file system abstraction to Spark 
> shuffle and have 2 modes of running it. In default mode, it will write to 
> local disk and in the DFS mode, it will write to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12320) throw exception if the number of fields does not line up for Tuple encoder

2015-12-14 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-12320:
---

 Summary: throw exception if the number of fields does not line up 
for Tuple encoder
 Key: SPARK-12320
 URL: https://issues.apache.org/jira/browse/SPARK-12320
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12317) Support configurate value with unit(e.g. kb/mb/gb) in SQL

2015-12-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12317:
--
Priority: Minor  (was: Major)

> Support configurate value with unit(e.g. kb/mb/gb) in SQL
> -
>
> Key: SPARK-12317
> URL: https://issues.apache.org/jira/browse/SPARK-12317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Yadong Qi
>Priority: Minor
>
> e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` 
> instead of `10485760`, because `10MB` is more easier than `10485760`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12318) Save mode in SparkR should be error by default

2015-12-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055745#comment-15055745
 ] 

Apache Spark commented on SPARK-12318:
--

User 'zjffdu' has created a pull request for this issue:
https://github.com/apache/spark/pull/10290

> Save mode in SparkR should be error by default
> --
>
> Key: SPARK-12318
> URL: https://issues.apache.org/jira/browse/SPARK-12318
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Jeff Zhang
>Priority: Minor
>
> The save mode in SparkR should be consistent with that of scala api



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12318) Save mode in SparkR should be error by default

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12318:


Assignee: Apache Spark

> Save mode in SparkR should be error by default
> --
>
> Key: SPARK-12318
> URL: https://issues.apache.org/jira/browse/SPARK-12318
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Jeff Zhang
>Assignee: Apache Spark
>Priority: Minor
>
> The save mode in SparkR should be consistent with that of scala api



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12319) Address endian specific problems surfaced in 1.6

2015-12-14 Thread Adam Roberts (JIRA)

Adam Roberts created SPARK-12319:


 Summary: Address endian specific problems surfaced in 1.6
 Key: SPARK-12319
 URL: https://issues.apache.org/jira/browse/SPARK-12319
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
 Environment: BE platforms
Reporter: Adam Roberts
Priority: Critical


JIRA to cover endian specific problems - since testing 1.6 I've noticed 
problems with DataFrames on BE platforms, e.g. 
https://issues.apache.org/jira/browse/SPARK-9858



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12318) Save mode in SparkR should be error by default

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12318:


Assignee: (was: Apache Spark)

> Save mode in SparkR should be error by default
> --
>
> Key: SPARK-12318
> URL: https://issues.apache.org/jira/browse/SPARK-12318
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Jeff Zhang
>Priority: Minor
>
> The save mode in SparkR should be consistent with that of scala api



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4497) HiveThriftServer2 does not exit properly on failure

2015-12-14 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055924#comment-15055924
 ] 

Jeff Zhang commented on SPARK-4497:
---

Can not reproduce it. [~yanakad] Is this still an issue for you ?

> HiveThriftServer2 does not exit properly on failure
> ---
>
> Key: SPARK-4497
> URL: https://issues.apache.org/jira/browse/SPARK-4497
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Yana Kadiyska
>Priority: Critical
>
> start thriftserver with 
> {{sbin/start-thriftserver.sh --master ...}}
> If there is an error (in my case namenode is in standby mode) the driver 
> shuts down properly:
> {code}
> 14/11/19 16:32:58 ERROR HiveThriftServer2: Error starting HiveThriftServer2
> 
> 14/11/19 16:32:59 INFO SparkUI: Stopped Spark web UI at http://myip:4040
> 14/11/19 16:32:59 INFO DAGScheduler: Stopping DAGScheduler
> 14/11/19 16:32:59 INFO SparkDeploySchedulerBackend: Shutting down all 
> executors
> 14/11/19 16:32:59 INFO SparkDeploySchedulerBackend: Asking each executor to 
> shut down
> 14/11/19 16:33:00 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor 
> stopped!
> 14/11/19 16:33:00 INFO MemoryStore: MemoryStore cleared
> 14/11/19 16:33:00 INFO BlockManager: BlockManager stopped
> 14/11/19 16:33:00 INFO BlockManagerMaster: BlockManagerMaster stopped
> 14/11/19 16:33:00 INFO SparkContext: Successfully stopped SparkContext
> {code}
> but trying to run {{sbin/start-thriftserver.sh --master ... }} again results 
> in an error that Thrifserver is already running.
> {{ps -aef|grep }} shows
> {code}
> root 32334 1  0 16:32 ?00:00:00 /usr/local/bin/java 
> org.apache.spark.deploy.SparkSubmitDriverBootstrapper --class 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --master 
> spark://myip:7077 --conf -spark.executor.extraJavaOptions=-verbose:gc 
> -XX:-PrintGCDetails -XX:+PrintGCTimeStamps spark-internal --hiveconf 
> hive.root.logger=INFO,console
> {code}
> This is problematic since we have a process that tries to restart the driver 
> if it dies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-14 Thread Nikita Tarasenko (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055937#comment-15055937
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

Do we need DirectKafkaInputDStream yet? We shouldn't use Zookeeper directly any 
more. At now DirectKafkaInputDStream is similar to KafkaInputDStream but with 
manual offset managment

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12303) Configuration parameter by which can choose if we want the REPL generated class directory name to be random or fixed name.

2015-12-14 Thread Prashant Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055862#comment-15055862
 ] 

Prashant Sharma commented on SPARK-12303:
-

Given one can already set root directory `spark.repl.classdir` under which the 
temp directory is created, I can not come up with any benefits of having a non 
random directory. Also one might need to clean it up before each run, because 
the class name series for each command as `$line1` will conflict with contents 
of the directory.


> Configuration parameter by which  can choose if we want the REPL generated 
> class directory name to be random or fixed name.
> ---
>
> Key: SPARK-12303
> URL: https://issues.apache.org/jira/browse/SPARK-12303
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Shell
>Reporter: piyush
>Priority: Minor
>
>  .class generated by spark REPL are stored in a temp directory with random 
> name.
> Configuration parameter by which  can choose if we want the REPL generated 
> class directory name to be random or fixed name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop

2015-12-14 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055633#comment-15055633
 ] 

Michael Han edited comment on SPARK-2356 at 12/14/15 9:09 AM:
--

Hello Everyone,

I encounter this issue today again when I tried to create a cluster using two 
windows 7 (64) desktop.
This errors happens when I register the second worker to the master using the 
following command:
spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077

Strange it works fine when I register the first worker to the master.
anyone knows some work around to fix this issue?
The above work around works fine when I using local mode.

I tried to set the HADOOP_HOME = C:\winutil in the env variables, but it 
doesn't work.
The error is:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/14 16:49:22 WARN NativeCodeLoader: Unable to load native-hadoop library fo
r your platform... using builtin-java classes where applicable
15/12/14 16:49:22 ERROR Shell: Failed to locate the winutils binary in the hadoo
p binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Ha
doop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.(Shell.java:363)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)

at org.apache.hadoop.security.Groups.(Groups.java:86)
at org.apache.hadoop.security.Groups.(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Group
s.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupI
nformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(Use
rGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(
UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGrou
pInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGr
oupInformation.java:621)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
.scala:2091)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
.scala:2091)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2091)
at org.apache.spark.SecurityManager.(SecurityManager.scala:212)
at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.
scala:692)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:674)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
15/12/14 16:49:22 INFO SecurityManager: Changing view acls to: mh6
15/12/14 16:49:22 INFO SecurityManager: Changing modify acls to: mh6
15/12/14 16:49:22 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users with view permissions: Set(mh6); users with modify per
missions: Set(mh6)
15/12/14 16:49:23 INFO Slf4jLogger: Slf4jLogger started
15/12/14 16:49:23 INFO Remoting: Starting remoting
15/12/14 16:49:24 INFO Remoting: Remoting started; listening on addresses :[akka
.tcp://sparkWorker@167.3.129.160:46862]
15/12/14 16:49:24 INFO Utils: Successfully started service 'sparkWorker' on port
 46862.
15/12/14 16:49:24 INFO Worker: Starting Spark worker 167.3.129.160:46862 with 4
cores, 2.9 GB RAM
15/12/14 16:49:24 INFO Worker: Running Spark version 1.5.2
15/12/14 16:49:24 INFO Worker: Spark home: C:\spark-1.5.2-bin-hadoop2.6\bin\..
15/12/14 16:49:24 INFO Utils: Successfully started service 'WorkerUI' on port 80
81.
15/12/14 16:49:24 INFO WorkerWebUI: Started WorkerWebUI at http://167.3.129.160:
8081
15/12/14 16:49:24 INFO Worker: Connecting to master 192.168.79.1:7077...
15/12/14 16:49:39 INFO Worker: Retrying connection to master (attempt # 1)
15/12/14 16:49:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thr
ead Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.Futur
eTask@3ef5e68c rejected from java.util.concurrent.ThreadPoolExecutor@741cb720[Ru
nning, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]

at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution
(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.jav
a:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.ja
va:1369)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorS
ervice.java:112)
at

[jira] [Updated] (SPARK-12319) Address endian specific problems surfaced in 1.6

2015-12-14 Thread Adam Roberts (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Roberts updated SPARK-12319:
-
Description: 
JIRA to cover endian specific problems - since testing 1.6 I've noticed 
problems with DataFrames on BE platforms, e.g. 
https://issues.apache.org/jira/browse/SPARK-9858

[~joshrosen] [~yhuai]

Current progress: using com.google.common.io.LittleEndianDataInputStream and 
com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer 
fixes three test failures in ExchangeCoordinatorSuite but I'm concerned around 
performance/wider functional implications

"org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input 
with reordering" fails as we expect "one, 1" but instead get "one, 9" - we 
believe the issue lies within BitSetMethods.java, specifically around: return 
(wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); 

  was:JIRA to cover endian specific problems - since testing 1.6 I've noticed 
problems with DataFrames on BE platforms, e.g. 
https://issues.apache.org/jira/browse/SPARK-9858


> Address endian specific problems surfaced in 1.6
> 
>
> Key: SPARK-12319
> URL: https://issues.apache.org/jira/browse/SPARK-12319
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: BE platforms
>Reporter: Adam Roberts
>Priority: Critical
>
> JIRA to cover endian specific problems - since testing 1.6 I've noticed 
> problems with DataFrames on BE platforms, e.g. 
> https://issues.apache.org/jira/browse/SPARK-9858
> [~joshrosen] [~yhuai]
> Current progress: using com.google.common.io.LittleEndianDataInputStream and 
> com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer 
> fixes three test failures in ExchangeCoordinatorSuite but I'm concerned 
> around performance/wider functional implications
> "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input 
> with reordering" fails as we expect "one, 1" but instead get "one, 9" - we 
> believe the issue lies within BitSetMethods.java, specifically around: return 
> (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12303) Configuration parameter by which can choose if we want the REPL generated class directory name to be random or fixed name.

2015-12-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12303.
---
Resolution: Won't Fix

> Configuration parameter by which  can choose if we want the REPL generated 
> class directory name to be random or fixed name.
> ---
>
> Key: SPARK-12303
> URL: https://issues.apache.org/jira/browse/SPARK-12303
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Shell
>Reporter: piyush
>Priority: Minor
>
>  .class generated by spark REPL are stored in a temp directory with random 
> name.
> Configuration parameter by which  can choose if we want the REPL generated 
> class directory name to be random or fixed name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-14 Thread Nikita Tarasenko (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055911#comment-15055911
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

I moved all changes for Kafka 0.9 to separate subproject.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12332) Typo in ResetSystemProperties.scala's comments

2015-12-14 Thread holdenk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk updated SPARK-12332:

Component/s: Tests

> Typo in ResetSystemProperties.scala's comments
> --
>
> Key: SPARK-12332
> URL: https://issues.apache.org/jira/browse/SPARK-12332
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: holdenk
>Priority: Trivial
>
> There is a minor typo (missing close bracket) inside of of 
> ResetSystemProperties.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12332) Typo in ResetSystemProperties.scala's comments

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12332:


Assignee: (was: Apache Spark)

> Typo in ResetSystemProperties.scala's comments
> --
>
> Key: SPARK-12332
> URL: https://issues.apache.org/jira/browse/SPARK-12332
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: holdenk
>Priority: Trivial
>
> There is a minor typo (missing close bracket) inside of of 
> ResetSystemProperties.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-9687) System.exit() still disrupt applications embedding Spark

2015-12-14 Thread Alberto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto closed SPARK-9687.
--

> System.exit() still disrupt applications embedding Spark
> 
>
> Key: SPARK-9687
> URL: https://issues.apache.org/jira/browse/SPARK-9687
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.4.1
>Reporter: Alberto
>Priority: Minor
>
> This issue was already reported in #SPARK-4783. It was addressed in the 
> following PR: 5492 but we are still having the same issue.
> The TaskSchedulerImpl class is now throwing a SparkException, this exception 
> is caught by the SparkUncaughtExceptionHandler which is again invoking a 
> System.exit()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread RJ Nowling (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056115#comment-15056115
 ] 

RJ Nowling edited comment on SPARK-4816 at 12/14/15 3:42 PM:
-

I tested it again to make sure and ran into the same issue:

{code}
$ mvn -version
Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 
2014-12-14T17:29:23+00:00)
Maven home: /usr/share/apache-maven
Java version: 1.7.0_85, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85-2.6.1.2.el7_1.x86_64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-229.1.2.el7.x86_64", arch: "amd64", family: 
"unix"

$ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.1.tgz
$ tar -xzvf spark-1.4.1.tgz
$ cd spark-1.4.1
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.4.0.jar | 
grep netlib-native

(No output)
{code}

If I build the head from git {{branch-1.4}} and run {{zipinfo}}:

{code}
$ git clone https://github.com/apache/spark.git spark-1.4-netlib
$ cd spark-1.4-netlib
$ git checkout origin/branch-1.4
$ git log | head
commit c7c99857d47e4ca8373ee9ac59e108a9c443dd05
Author: Sean Owen 
Date:   Tue Dec 8 14:34:47 2015 +

[SPARK-11652][CORE] Remote code execution with InvokerTransformer

Fix commons-collection group ID to commons-collections for version 3.x

Patches earlier PR at https://github.com/apache/spark/pull/9731

$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.3-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native
netlib-native_ref-osx-x86_64.jnilib
netlib-native_ref-osx-x86_64.jnilib.asc
netlib-native_ref-osx-x86_64.pom
netlib-native_ref-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.properties
netlib-native_ref-linux-x86_64.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.properties
netlib-native_ref-linux-i686.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.properties
netlib-native_ref-win-x86_64.dll
netlib-native_ref-win-x86_64.dll.asc
netlib-native_ref-win-x86_64.pom
netlib-native_ref-win-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.properties
netlib-native_ref-win-i686.dll
netlib-native_ref-win-i686.dll.asc
netlib-native_ref-win-i686.pom
netlib-native_ref-win-i686.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.properties
netlib-native_ref-linux-armhf.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.properties
netlib-native_system-osx-x86_64.jnilib
netlib-native_system-osx-x86_64.jnilib.asc
netlib-native_system-osx-x86_64.pom
netlib-native_system-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.properties
netlib-native_system-linux-x86_64.pom.asc
netlib-native_system-linux-x86_64.pom
netlib-native_system-linux-x86_64.so
netlib-native_system-linux-x86_64.so.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.properties
netlib-native_system-linux-i686.pom
netlib-native_system-linux-i686.so.asc
netlib-native_system-linux-i686.pom.asc
netlib-native_system-linux-i686.so
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.properties
netlib-native_system-linux-armhf.pom
netlib-native_system-linux-armhf.so.asc

[jira] [Closed] (SPARK-12322) recompute an cached RDD partition when getting its block is failed

2015-12-14 Thread Lianhui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lianhui Wang closed SPARK-12322.

Resolution: Invalid

> recompute an cached RDD partition when getting its block is failed
> --
>
> Key: SPARK-12322
> URL: https://issues.apache.org/jira/browse/SPARK-12322
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Lianhui Wang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11948) Permanent UDF not work

2015-12-14 Thread Ewan Leith (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056155#comment-15056155
 ] 

Ewan Leith commented on SPARK-11948:


I think this is a duplicate of SPARK-11609 ?

> Permanent UDF not work
> --
>
> Key: SPARK-11948
> URL: https://issues.apache.org/jira/browse/SPARK-11948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Weizhong
>Priority: Minor
>
> We create a function,
> {noformat}
> add jar /home/test/smartcare-udf-0.0.1-SNAPSHOT.jar;
> create function arr_greater_equal as 
> 'smartcare.dac.hive.udf.UDFArrayGreaterEqual';
> {noformat}
>  but "show functions" don't display, and when we create the same function 
> again, it throw exception as below:
> {noformat}
> Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: 
> Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask. 
> AlreadyExistsException(message:Function arr_greater_equal already exists) 
> (state=,code=0)
> {noformat}
> But if we use this function, it throw exception as below:
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: undefined function 
> arr_greater_equal; line 1 pos 119 (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread RJ Nowling (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056168#comment-15056168
 ] 

RJ Nowling commented on SPARK-4816:
---

I think issue [SPARK-9507] fixed the issue. I checked out git commit 
5ad9f950c4bd0042d79cdccb5277c10f8412be85 (the commit before 
[https://github.com/apache/spark/commit/b53ca247d4a965002a9f31758ea2b28fe117d45f])
 and found that the {{netlib-native}} libraries were missing:

{code}
$ git checkout 5ad9f950c4bd0042d79cdccb5277c10f8412be85
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native

(No output
{code}

As such, the changes in [SPARK-8819] might have been the original cause.

> Maven profile netlib-lgpl does not work
> ---
>
> Key: SPARK-4816
> URL: https://issues.apache.org/jira/browse/SPARK-4816
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
> Environment: maven 3.0.5 / Ubuntu
>Reporter: Guillaume Pitel
>Priority: Minor
> Fix For: 1.1.1
>
>
> When doing what the documentation recommends to recompile Spark with Netlib 
> Native system binding (i.e. to bind with openblas or, in my case, MKL), 
> mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests 
> clean package
> The resulting assembly jar still lacked the netlib-system class. (I checked 
> the content of spark-assembly...jar)
> When forcing the netlib-lgpl profile in MLLib package to be active, the jar 
> is correctly built.
> So I guess it's a problem with the way maven passes profiles activitations to 
> children modules.
> Also, despite the documentation claiming that if the job's jar contains 
> netlib with necessary bindings, it should works, it does not. The classloader 
> must be unhappy with two occurrences of netlib ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread RJ Nowling (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056168#comment-15056168
 ] 

RJ Nowling edited comment on SPARK-4816 at 12/14/15 4:16 PM:
-

I think [SPARK-9507] fixed the issue. I checked out git commit 
{{5ad9f950c4bd0042d79cdccb5277c10f8412be85}} (the commit before 
[https://github.com/apache/spark/commit/b53ca247d4a965002a9f31758ea2b28fe117d45f])
 and found that the {{netlib-native}} libraries were missing:

{code}
$ git checkout 5ad9f950c4bd0042d79cdccb5277c10f8412be85
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native

(No output)
{code}

I then checked out {{b53ca247d4a965002a9f31758ea2b28fe117d45f}} and built it to 
test:

{code}
zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native
netlib-native_ref-osx-x86_64.jnilib
netlib-native_ref-osx-x86_64.jnilib.asc
netlib-native_ref-osx-x86_64.pom
netlib-native_ref-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.properties
netlib-native_ref-linux-x86_64.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.properties
netlib-native_ref-linux-i686.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.properties
netlib-native_ref-win-x86_64.dll
netlib-native_ref-win-x86_64.dll.asc
netlib-native_ref-win-x86_64.pom
netlib-native_ref-win-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.properties
netlib-native_ref-win-i686.dll
netlib-native_ref-win-i686.dll.asc
netlib-native_ref-win-i686.pom
netlib-native_ref-win-i686.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.properties
netlib-native_ref-linux-armhf.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.properties
netlib-native_system-osx-x86_64.jnilib
netlib-native_system-osx-x86_64.jnilib.asc
netlib-native_system-osx-x86_64.pom
netlib-native_system-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.properties
netlib-native_system-linux-x86_64.pom.asc
netlib-native_system-linux-x86_64.pom
netlib-native_system-linux-x86_64.so
netlib-native_system-linux-x86_64.so.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.properties
netlib-native_system-linux-i686.pom
netlib-native_system-linux-i686.so.asc
netlib-native_system-linux-i686.pom.asc
netlib-native_system-linux-i686.so
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.properties
netlib-native_system-linux-armhf.pom
netlib-native_system-linux-armhf.so.asc
netlib-native_system-linux-armhf.pom.asc
netlib-native_system-linux-armhf.so
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/pom.properties
netlib-native_system-win-x86_64.dll
netlib-native_system-win-x86_64.dll.asc
netlib-native_system-win-x86_64.pom
netlib-native_system-win-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/pom.properties
netlib-native_system-win-i686.dll
netlib-native_system-win-i686.dll.asc

[jira] [Comment Edited] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread RJ Nowling (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056168#comment-15056168
 ] 

RJ Nowling edited comment on SPARK-4816 at 12/14/15 4:19 PM:
-

I think [SPARK-9507] fixed the issue. I checked out git commit 
{{5ad9f950c4bd0042d79cdccb5277c10f8412be85}} (the commit before 
[https://github.com/apache/spark/commit/b53ca247d4a965002a9f31758ea2b28fe117d45f])
 and found that the {{netlib-native}} libraries were missing:

{code}
$ git checkout 5ad9f950c4bd0042d79cdccb5277c10f8412be85
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native

(No output)
{code}

I then checked out {{b53ca247d4a965002a9f31758ea2b28fe117d45f}} and built it to 
test:

{code}
$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native
netlib-native_ref-osx-x86_64.jnilib
netlib-native_ref-osx-x86_64.jnilib.asc
netlib-native_ref-osx-x86_64.pom
netlib-native_ref-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.properties
netlib-native_ref-linux-x86_64.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.properties
netlib-native_ref-linux-i686.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.properties
netlib-native_ref-win-x86_64.dll
netlib-native_ref-win-x86_64.dll.asc
netlib-native_ref-win-x86_64.pom
netlib-native_ref-win-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.properties
netlib-native_ref-win-i686.dll
netlib-native_ref-win-i686.dll.asc
netlib-native_ref-win-i686.pom
netlib-native_ref-win-i686.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.properties
netlib-native_ref-linux-armhf.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.properties
netlib-native_system-osx-x86_64.jnilib
netlib-native_system-osx-x86_64.jnilib.asc
netlib-native_system-osx-x86_64.pom
netlib-native_system-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.properties
netlib-native_system-linux-x86_64.pom.asc
netlib-native_system-linux-x86_64.pom
netlib-native_system-linux-x86_64.so
netlib-native_system-linux-x86_64.so.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.properties
netlib-native_system-linux-i686.pom
netlib-native_system-linux-i686.so.asc
netlib-native_system-linux-i686.pom.asc
netlib-native_system-linux-i686.so
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.properties
netlib-native_system-linux-armhf.pom
netlib-native_system-linux-armhf.so.asc
netlib-native_system-linux-armhf.pom.asc
netlib-native_system-linux-armhf.so
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/pom.properties
netlib-native_system-win-x86_64.dll
netlib-native_system-win-x86_64.dll.asc
netlib-native_system-win-x86_64.pom
netlib-native_system-win-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/pom.properties
netlib-native_system-win-i686.dll
netlib-native_system-win-i686.dll.asc

[jira] [Comment Edited] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop

2015-12-14 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055633#comment-15055633
 ] 

Michael Han edited comment on SPARK-2356 at 12/14/15 9:05 AM:
--

Hello Everyone,

I encounter this issue today again when I tried to create a cluster using two 
windows 7 (64) desktop.
This errors happens when I register the second worker to the master using the 
following command:
spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077

Strange it works fine when I register the first worker to the master.
anyone knows some work around to fix this issue?
The above work around works fine when I using local mode.

The error is:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/14 16:49:22 WARN NativeCodeLoader: Unable to load native-hadoop library fo
r your platform... using builtin-java classes where applicable
15/12/14 16:49:22 ERROR Shell: Failed to locate the winutils binary in the hadoo
p binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Ha
doop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.(Shell.java:363)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)

at org.apache.hadoop.security.Groups.(Groups.java:86)
at org.apache.hadoop.security.Groups.(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Group
s.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupI
nformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(Use
rGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(
UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGrou
pInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGr
oupInformation.java:621)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
.scala:2091)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils
.scala:2091)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2091)
at org.apache.spark.SecurityManager.(SecurityManager.scala:212)
at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.
scala:692)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:674)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
15/12/14 16:49:22 INFO SecurityManager: Changing view acls to: mh6
15/12/14 16:49:22 INFO SecurityManager: Changing modify acls to: mh6
15/12/14 16:49:22 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users with view permissions: Set(mh6); users with modify per
missions: Set(mh6)
15/12/14 16:49:23 INFO Slf4jLogger: Slf4jLogger started
15/12/14 16:49:23 INFO Remoting: Starting remoting
15/12/14 16:49:24 INFO Remoting: Remoting started; listening on addresses :[akka
.tcp://sparkWorker@167.3.129.160:46862]
15/12/14 16:49:24 INFO Utils: Successfully started service 'sparkWorker' on port
 46862.
15/12/14 16:49:24 INFO Worker: Starting Spark worker 167.3.129.160:46862 with 4
cores, 2.9 GB RAM
15/12/14 16:49:24 INFO Worker: Running Spark version 1.5.2
15/12/14 16:49:24 INFO Worker: Spark home: C:\spark-1.5.2-bin-hadoop2.6\bin\..
15/12/14 16:49:24 INFO Utils: Successfully started service 'WorkerUI' on port 80
81.
15/12/14 16:49:24 INFO WorkerWebUI: Started WorkerWebUI at http://167.3.129.160:
8081
15/12/14 16:49:24 INFO Worker: Connecting to master 192.168.79.1:7077...
15/12/14 16:49:39 INFO Worker: Retrying connection to master (attempt # 1)
15/12/14 16:49:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thr
ead Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.Futur
eTask@3ef5e68c rejected from java.util.concurrent.ThreadPoolExecutor@741cb720[Ru
nning, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]

at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution
(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.jav
a:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.ja
va:1369)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorS
ervice.java:112)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deplo

[jira] [Created] (SPARK-12322) recompute an cached RDD partition when getting its block is failed

2015-12-14 Thread Lianhui Wang (JIRA)

Lianhui Wang created SPARK-12322:


 Summary: recompute an cached RDD partition when getting its block 
is failed
 Key: SPARK-12322
 URL: https://issues.apache.org/jira/browse/SPARK-12322
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Lianhui Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11948) Permanent UDF not work

2015-12-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11948.
---
  Resolution: Duplicate
Target Version/s:   (was: 1.6.0)

> Permanent UDF not work
> --
>
> Key: SPARK-11948
> URL: https://issues.apache.org/jira/browse/SPARK-11948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Weizhong
>Priority: Minor
>
> We create a function,
> {noformat}
> add jar /home/test/smartcare-udf-0.0.1-SNAPSHOT.jar;
> create function arr_greater_equal as 
> 'smartcare.dac.hive.udf.UDFArrayGreaterEqual';
> {noformat}
>  but "show functions" don't display, and when we create the same function 
> again, it throw exception as below:
> {noformat}
> Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: 
> Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask. 
> AlreadyExistsException(message:Function arr_greater_equal already exists) 
> (state=,code=0)
> {noformat}
> But if we use this function, it throw exception as below:
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: undefined function 
> arr_greater_equal; line 1 pos 119 (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12219) Spark 1.5.2 code does not build on Scala 2.11.7 with SBT assembly

2015-12-14 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056172#comment-15056172
 ] 

Sean Owen commented on SPARK-12219:
---

[~RodBoavida] I'm not able to reproduce these compilation failures. In fact it 
looks like these were already resolved by 
https://github.com/apache/spark/pull/9126  Do you see the same on {{master}} -- 
does it work?

For example {{UnionRDD.scala:40}} is already {{@transient private val rdd: 
RDD[T]}}

It may be a question of back-porting those changes to 1.5.x. It may not be as 
high priority since this is already fixed in 1.6.

> Spark 1.5.2 code does not build on Scala 2.11.7 with SBT assembly
> -
>
> Key: SPARK-12219
> URL: https://issues.apache.org/jira/browse/SPARK-12219
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.2
>Reporter: Rodrigo Boavida
>
> I've tried with no success to build Spark on Scala 2.11.7. I'm getting build 
> errors using sbt due to the issues found in the below thread in July of this 
> year.
> https://mail-archives.apache.org/mod_mbox/spark-dev/201507.mbox/%3CCA+3qhFSJGmZToGmBU1=ivy7kr6eb7k8t6dpz+ibkstihryw...@mail.gmail.com%3E
> Seems some minor fixes are needed to make the Scala 2.11 compiler happy.
> I needed to build with SBT as per suggested on below thread to get over some 
> apparent maven shader plugin because which changed some classes when I change 
> to akka 2.4.0.
> https://groups.google.com/forum/#!topic/akka-user/iai6whR6-xU
> I've set this bug to Major priority assuming that the Spark community wants 
> to keep fully supporting SBT builds, including the Scala 2.11 compatibility.
> Tnks,
> Rod



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12219) Spark 1.5.2 code does not build on Scala 2.11.7 with SBT assembly

2015-12-14 Thread Rodrigo Boavida (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056180#comment-15056180
 ] 

Rodrigo Boavida commented on SPARK-12219:
-

Sean - that's great news. Unfortunately I haven't had time to check on the 
errors. I will get a latest copy of 1.6, build it again and let you know.

Cheers,
Rod



> Spark 1.5.2 code does not build on Scala 2.11.7 with SBT assembly
> -
>
> Key: SPARK-12219
> URL: https://issues.apache.org/jira/browse/SPARK-12219
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.2
>Reporter: Rodrigo Boavida
>
> I've tried with no success to build Spark on Scala 2.11.7. I'm getting build 
> errors using sbt due to the issues found in the below thread in July of this 
> year.
> https://mail-archives.apache.org/mod_mbox/spark-dev/201507.mbox/%3CCA+3qhFSJGmZToGmBU1=ivy7kr6eb7k8t6dpz+ibkstihryw...@mail.gmail.com%3E
> Seems some minor fixes are needed to make the Scala 2.11 compiler happy.
> I needed to build with SBT as per suggested on below thread to get over some 
> apparent maven shader plugin because which changed some classes when I change 
> to akka 2.4.0.
> https://groups.google.com/forum/#!topic/akka-user/iai6whR6-xU
> I've set this bug to Major priority assuming that the Spark community wants 
> to keep fully supporting SBT builds, including the Scala 2.11 compatibility.
> Tnks,
> Rod



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056122#comment-15056122
 ] 

Sean Owen commented on SPARK-4816:
--

I did the same pretty much down to the letter and found the netlib-native 
artifacts in the assembly JAR. I'll try your exact steps later to see what i 
can see.

> Maven profile netlib-lgpl does not work
> ---
>
> Key: SPARK-4816
> URL: https://issues.apache.org/jira/browse/SPARK-4816
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
> Environment: maven 3.0.5 / Ubuntu
>Reporter: Guillaume Pitel
>Priority: Minor
> Fix For: 1.1.1
>
>
> When doing what the documentation recommends to recompile Spark with Netlib 
> Native system binding (i.e. to bind with openblas or, in my case, MKL), 
> mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests 
> clean package
> The resulting assembly jar still lacked the netlib-system class. (I checked 
> the content of spark-assembly...jar)
> When forcing the netlib-lgpl profile in MLLib package to be active, the jar 
> is correctly built.
> So I guess it's a problem with the way maven passes profiles activitations to 
> children modules.
> Also, despite the documentation claiming that if the job's jar contains 
> netlib with necessary bindings, it should works, it does not. The classloader 
> must be unhappy with two occurrences of netlib ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread RJ Nowling (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

RJ Nowling reopened SPARK-4816:
---

> Maven profile netlib-lgpl does not work
> ---
>
> Key: SPARK-4816
> URL: https://issues.apache.org/jira/browse/SPARK-4816
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
> Environment: maven 3.0.5 / Ubuntu
>Reporter: Guillaume Pitel
>Priority: Minor
> Fix For: 1.1.1
>
>
> When doing what the documentation recommends to recompile Spark with Netlib 
> Native system binding (i.e. to bind with openblas or, in my case, MKL), 
> mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests 
> clean package
> The resulting assembly jar still lacked the netlib-system class. (I checked 
> the content of spark-assembly...jar)
> When forcing the netlib-lgpl profile in MLLib package to be active, the jar 
> is correctly built.
> So I guess it's a problem with the way maven passes profiles activitations to 
> children modules.
> Also, despite the documentation claiming that if the job's jar contains 
> netlib with necessary bindings, it should works, it does not. The classloader 
> must be unhappy with two occurrences of netlib ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread RJ Nowling (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056115#comment-15056115
 ] 

RJ Nowling commented on SPARK-4816:
---

I tested it again to make sure and ran into the same issue:

{code}
$ mvn -version
Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 
2014-12-14T17:29:23+00:00)
Maven home: /usr/share/apache-maven
Java version: 1.7.0_85, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85-2.6.1.2.el7_1.x86_64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-229.1.2.el7.x86_64", arch: "amd64", family: 
"unix"

$ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.1.tgz
$ tar -xzvf spark-1.4.1.tgz
$ cd spark-1.4.1
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.4.0.jar | 
grep netlib-native

(No output)
{code}

If I build the head from git {{branch-1.4}} and run {{zipinfo}}:

{code}
$ git log | head
commit c7c99857d47e4ca8373ee9ac59e108a9c443dd05
Author: Sean Owen 
Date:   Tue Dec 8 14:34:47 2015 +

[SPARK-11652][CORE] Remote code execution with InvokerTransformer

Fix commons-collection group ID to commons-collections for version 3.x

Patches earlier PR at https://github.com/apache/spark/pull/9731

$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.3-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native
netlib-native_ref-osx-x86_64.jnilib
netlib-native_ref-osx-x86_64.jnilib.asc
netlib-native_ref-osx-x86_64.pom
netlib-native_ref-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.properties
netlib-native_ref-linux-x86_64.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.properties
netlib-native_ref-linux-i686.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.properties
netlib-native_ref-win-x86_64.dll
netlib-native_ref-win-x86_64.dll.asc
netlib-native_ref-win-x86_64.pom
netlib-native_ref-win-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.properties
netlib-native_ref-win-i686.dll
netlib-native_ref-win-i686.dll.asc
netlib-native_ref-win-i686.pom
netlib-native_ref-win-i686.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.properties
netlib-native_ref-linux-armhf.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.properties
netlib-native_system-osx-x86_64.jnilib
netlib-native_system-osx-x86_64.jnilib.asc
netlib-native_system-osx-x86_64.pom
netlib-native_system-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.properties
netlib-native_system-linux-x86_64.pom.asc
netlib-native_system-linux-x86_64.pom
netlib-native_system-linux-x86_64.so
netlib-native_system-linux-x86_64.so.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.properties
netlib-native_system-linux-i686.pom
netlib-native_system-linux-i686.so.asc
netlib-native_system-linux-i686.pom.asc
netlib-native_system-linux-i686.so
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.properties
netlib-native_system-linux-armhf.pom
netlib-native_system-linux-armhf.so.asc
netlib-native_system-linux-armhf.pom.asc
netlib-native_system-linux-armhf.so
META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/

[jira] [Comment Edited] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread RJ Nowling (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056168#comment-15056168
 ] 

RJ Nowling edited comment on SPARK-4816 at 12/14/15 3:57 PM:
-

I think issue [SPARK-9507] fixed the issue. I checked out git commit 
5ad9f950c4bd0042d79cdccb5277c10f8412be85 (the commit before 
[https://github.com/apache/spark/commit/b53ca247d4a965002a9f31758ea2b28fe117d45f])
 and found that the {{netlib-native}} libraries were missing:

{code}
$ git checkout 5ad9f950c4bd0042d79cdccb5277c10f8412be85
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native

(No output)
{code}

As such, the changes in [SPARK-8819] might have been the original cause.


was (Author: rnowling):
I think issue [SPARK-9507] fixed the issue. I checked out git commit 
5ad9f950c4bd0042d79cdccb5277c10f8412be85 (the commit before 
[https://github.com/apache/spark/commit/b53ca247d4a965002a9f31758ea2b28fe117d45f])
 and found that the {{netlib-native}} libraries were missing:

{code}
$ git checkout 5ad9f950c4bd0042d79cdccb5277c10f8412be85
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native

(No output
{code}

As such, the changes in [SPARK-8819] might have been the original cause.

> Maven profile netlib-lgpl does not work
> ---
>
> Key: SPARK-4816
> URL: https://issues.apache.org/jira/browse/SPARK-4816
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
> Environment: maven 3.0.5 / Ubuntu
>Reporter: Guillaume Pitel
>Priority: Minor
> Fix For: 1.1.1
>
>
> When doing what the documentation recommends to recompile Spark with Netlib 
> Native system binding (i.e. to bind with openblas or, in my case, MKL), 
> mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests 
> clean package
> The resulting assembly jar still lacked the netlib-system class. (I checked 
> the content of spark-assembly...jar)
> When forcing the netlib-lgpl profile in MLLib package to be active, the jar 
> is correctly built.
> So I guess it's a problem with the way maven passes profiles activitations to 
> children modules.
> Also, despite the documentation claiming that if the job's jar contains 
> netlib with necessary bindings, it should works, it does not. The classloader 
> must be unhappy with two occurrences of netlib ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread RJ Nowling (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056168#comment-15056168
 ] 

RJ Nowling edited comment on SPARK-4816 at 12/14/15 3:58 PM:
-

I think [SPARK-9507] fixed the issue. I checked out git commit 
{{5ad9f950c4bd0042d79cdccb5277c10f8412be85}} (the commit before 
[https://github.com/apache/spark/commit/b53ca247d4a965002a9f31758ea2b28fe117d45f])
 and found that the {{netlib-native}} libraries were missing:

{code}
$ git checkout 5ad9f950c4bd0042d79cdccb5277c10f8412be85
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native

(No output)
{code}

As such, the changes in [SPARK-8819] might have been the original cause.


was (Author: rnowling):
I think issue [SPARK-9507] fixed the issue. I checked out git commit 
5ad9f950c4bd0042d79cdccb5277c10f8412be85 (the commit before 
[https://github.com/apache/spark/commit/b53ca247d4a965002a9f31758ea2b28fe117d45f])
 and found that the {{netlib-native}} libraries were missing:

{code}
$ git checkout 5ad9f950c4bd0042d79cdccb5277c10f8412be85
$ mvn -Pnetlib-lgpl -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean 
package
$ zipinfo -1 
assembly/target/scala-2.10/spark-assembly-1.4.2-SNAPSHOT-hadoop2.4.0.jar | grep 
netlib-native

(No output)
{code}

As such, the changes in [SPARK-8819] might have been the original cause.

> Maven profile netlib-lgpl does not work
> ---
>
> Key: SPARK-4816
> URL: https://issues.apache.org/jira/browse/SPARK-4816
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
> Environment: maven 3.0.5 / Ubuntu
>Reporter: Guillaume Pitel
>Priority: Minor
> Fix For: 1.1.1
>
>
> When doing what the documentation recommends to recompile Spark with Netlib 
> Native system binding (i.e. to bind with openblas or, in my case, MKL), 
> mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests 
> clean package
> The resulting assembly jar still lacked the netlib-system class. (I checked 
> the content of spark-assembly...jar)
> When forcing the netlib-lgpl profile in MLLib package to be active, the jar 
> is correctly built.
> So I guess it's a problem with the way maven passes profiles activitations to 
> children modules.
> Also, despite the documentation claiming that if the job's jar contains 
> netlib with necessary bindings, it should works, it does not. The classloader 
> must be unhappy with two occurrences of netlib ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11255) R Test build should run on R 3.1.1

2015-12-14 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057368#comment-15057368
 ] 

Felix Cheung commented on SPARK-11255:
--

Is this test error caused by this?

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47673/consoleFull
{code}
1. Error: include inside function --
(converted from warning) package 'plyr' was built under R version 3.2.1
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, 
message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: suppressPackageStartupMessages(library(plyr)) at test_includePackage.R:30
5: withCallingHandlers(expr, packageStartupMessage = function(c) 
invokeRestart("muffleMessage"))
6: library(plyr)
7: testRversion(pkgInfo, package, pkgpath)
8: warning(gettextf("package %s was built under R version %s", sQuote(pkgname), 
as.character(built$R)), 
   call. = FALSE, domain = NA)
9: .signalSimpleWarning("package 'plyr' was built under R version 3.2.1", 
quote(NULL))
10: withRestarts({
   .Internal(.signalCondition(simpleWarning(msg, call), msg, call))
   .Internal(.dfltWarn(msg, call))
   }, muffleWarning = function() NULL)
11: withOneRestart(expr, restarts[[1L]])
12: doWithOneRestart(return(expr), restart)
{code}

> R Test build should run on R 3.1.1
> --
>
> Key: SPARK-11255
> URL: https://issues.apache.org/jira/browse/SPARK-11255
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Felix Cheung
>Assignee: shane knapp
>Priority: Minor
>
> Test should run on R 3.1.1 which is the version listed as supported.
> Apparently there are few R changes that can go undetected since Jenkins Test 
> build is running something newer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12327) lint-r checks fail with commented code

2015-12-14 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057364#comment-15057364
 ] 

Felix Cheung commented on SPARK-12327:
--

In fact [~yu_ishikawa] did open a PR for a subset of these cases.
btw, why are these showing up now? I saw them before when running lint-r 
locally but they didn't fail Jenkins then


> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date -> Date
>

[jira] [Comment Edited] (SPARK-11255) R Test build should run on R 3.1.1

2015-12-14 Thread Sun Rui (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057374#comment-15057374
 ] 

Sun Rui edited comment on SPARK-11255 at 12/15/15 5:31 AM:
---

seems 'plyr' package has to be rebuilt under R 3.1.1. But is 'plyr' required to 
run SparkR? could it be simply un-installed from R?


was (Author: sunrui):
seems 'plyr' package has to be rebuilt under R 3.1.1

> R Test build should run on R 3.1.1
> --
>
> Key: SPARK-11255
> URL: https://issues.apache.org/jira/browse/SPARK-11255
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Felix Cheung
>Assignee: shane knapp
>Priority: Minor
>
> Test should run on R 3.1.1 which is the version listed as supported.
> Apparently there are few R changes that can go undetected since Jenkins Test 
> build is running something newer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11255) R Test build should run on R 3.1.1

2015-12-14 Thread Sun Rui (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057374#comment-15057374
 ] 

Sun Rui commented on SPARK-11255:
-

seems 'plyr' package has to be rebuilt under R 3.1.1

> R Test build should run on R 3.1.1
> --
>
> Key: SPARK-11255
> URL: https://issues.apache.org/jira/browse/SPARK-11255
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Felix Cheung
>Assignee: shane knapp
>Priority: Minor
>
> Test should run on R 3.1.1 which is the version listed as supported.
> Apparently there are few R changes that can go undetected since Jenkins Test 
> build is running something newer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12332) Typo in ResetSystemProperties.scala's comments

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12332:


Assignee: Apache Spark

> Typo in ResetSystemProperties.scala's comments
> --
>
> Key: SPARK-12332
> URL: https://issues.apache.org/jira/browse/SPARK-12332
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> There is a minor typo (missing close bracket) inside of of 
> ResetSystemProperties.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%

2015-12-14 Thread jeffonia Tung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057400#comment-15057400
 ] 

jeffonia Tung commented on SPARK-4049:
--

I got the same phase, the fraction cached goes up to 200%, and all system going 
well! I am just confused about that. 

> Storage web UI "fraction cached" shows as > 100%
> 
>
> Key: SPARK-4049
> URL: https://issues.apache.org/jira/browse/SPARK-4049
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> In the Storage tab of the Spark Web UI, I saw a case where the "Fraction 
> Cached" was greater than 100%:
> !http://i.imgur.com/Gm2hEeL.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%

2015-12-14 Thread Warren Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057500#comment-15057500
 ] 

Warren Wang commented on SPARK-4049:


Actually the fraction cached will not expand when the spark.speculation is 
closed in version 1.5.1. The speculation will laugh attempted tasks for the 
slower task. But I still think the expanded fractions need to be cleared.

> Storage web UI "fraction cached" shows as > 100%
> 
>
> Key: SPARK-4049
> URL: https://issues.apache.org/jira/browse/SPARK-4049
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> In the Storage tab of the Spark Web UI, I saw a case where the "Fraction 
> Cached" was greater than 100%:
> !http://i.imgur.com/Gm2hEeL.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12333) Support shuffle spill encryption in Spark

2015-12-14 Thread Ferdinand Xu (JIRA)

Ferdinand Xu created SPARK-12333:


 Summary: Support shuffle spill encryption in Spark
 Key: SPARK-12333
 URL: https://issues.apache.org/jira/browse/SPARK-12333
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: Ferdinand Xu


Like shuffle file encryption in SPARK-5682, spills data should also be 
encrypted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8402) Add DP means clustering to MLlib

2015-12-14 Thread Meethu Mathew (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meethu Mathew updated SPARK-8402:
-
Summary: Add DP means clustering to MLlib  (was: DP means clustering )

> Add DP means clustering to MLlib
> 
>
> Key: SPARK-8402
> URL: https://issues.apache.org/jira/browse/SPARK-8402
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Meethu Mathew
>Assignee: Meethu Mathew
>  Labels: features
>
> At present, all the clustering algorithms in MLlib require the number of 
> clusters to be specified in advance. 
> The Dirichlet process (DP) is a popular non-parametric Bayesian mixture model 
> that allows for flexible clustering of data without having to specify apriori 
> the number of clusters. 
> DP means is a non-parametric clustering algorithm that uses a scale parameter 
> 'lambda' to control the creation of new clusters ["Revisiting k-means: New 
> Algorithms via Bayesian Nonparametrics" by Brian Kulis, Michael I. Jordan].
> We have followed the distributed implementation of DP means which has been 
> proposed in the paper titled "MLbase: Distributed Machine Learning Made Easy" 
> by Xinghao Pan, Evan R. Sparks, Andre Wibisono.
> A benchmark comparison between k-means and dp-means based on Normalized 
> Mutual Information between ground truth clusters and algorithm outputs, have 
> been provided in the following table. It can be seen from the table that 
> DP-means reported a higher NMI on 5 of 8 data sets in comparison to 
> k-means[Source: Kulis, B., Jordan, M.I.: Revisiting k-means: New algorithms 
> via Bayesian nonparametrics (2011) Arxiv:.0352. (Table 1)]
> | Dataset   | DP-means | k-means |
> | Wine  | .41  | .43 |
> | Iris  | .75  | .76 |
> | Pima  | .02  | .03 |
> | Soybean   | .72  | .66 |
> | Car   | .07  | .05 |
> | Balance Scale | .17  | .11 |
> | Breast Cancer | .04  | .03 |
> | Vehicle   | .18  | .18 |
> Experiment on our spark cluster setup:
> An initial benchmark study was performed on a 3 node Spark cluster setup on 
> mesos where each node config was 8 Cores, 64 GB RAM and the spark version 
> used was 1.5(git branch).
> Tests were done using a mixture of 10 Gaussians with varying number of 
> features and instances. The results from the benchmark study are provided 
> below. The reported stats are average over 5 runs. 
> | DATASET || DPMEANS |   |
>  | KMEANS (k =10) | |
> | Instances   | Dimensions | No of clusters obtained | Time  | Converged in 
> iterations |  Time  | Converged in iterations |
> |  10 million | 10 |10   | 43.6s |2   
>  |  52.2s |2|
> |  1 million  | 100|10   | 39.8s |2   
>  | 43.39s |2|
> | 0.1 million |1000|10   | 37.3s |2   
>  | 41.64s |2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12332) Typo in ResetSystemProperties.scala's comments

2015-12-14 Thread holdenk (JIRA)

holdenk created SPARK-12332:
---

 Summary: Typo in ResetSystemProperties.scala's comments
 Key: SPARK-12332
 URL: https://issues.apache.org/jira/browse/SPARK-12332
 Project: Spark
  Issue Type: Bug
Reporter: holdenk
Priority: Trivial


There is a minor typo (missing close bracket) inside of of 
ResetSystemProperties.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-14 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12288.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10285
[https://github.com/apache/spark/pull/10285]

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12330) Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk

2015-12-14 Thread Charles Allen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated SPARK-12330:
--
Description: 
A simple line count example can be launched as similar to 

{code}
SPARK_HOME=/mnt/tmp/spark 
MASTER=mesos://zk://zk.metamx-prod.com:2181/mesos-druid/metrics 
./bin/spark-shell --conf spark.mesos.coarse=true --conf spark.cores.max=7 
--conf spark.mesos.executor.memoryOverhead=2048 --conf 
spark.mesos.executor.home=/mnt/tmp/spark --conf 
spark.executor.extraJavaOptions='-Duser.timezone=UTC -Dfile.encoding=UTF-8 
-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution 
-XX:+PrintFlagsFinal -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MaxDirectMemorySize=1024m 
-verbose:gc -XX:+PrintFlagsFinal -Djava.io.tmpdir=/mnt/tmp/scratch' --conf 
spark.hadoop.fs.s3n.awsAccessKeyId='REDACTED' --conf 
spark.hadoop.fs.s3n.awsSecretAccessKey='REDACTED' --conf 
spark.executor.memory=7g --conf spark.executorEnv.GLOG_v=9 --conf 
spark.storage.memoryFraction=0.0 --conf spark.shuffle.memoryFraction=0.0
{code}

In the shell the following lines can be executed:

{code}
val text_file = 
sc.textFile("s3n://REDACTED/charlesallen/tpch/lineitem.tbl").persist(org.apache.spark.storage.StorageLevel.DISK_ONLY)
{code}
{code}
text_file.map(l => 1).sum
{code}
which will result in
{code}
res0: Double = 6001215.0
{code}
for the TPCH 1GB dataset

Unfortunately the blockmgr directory remains on the executor node after 
termination of the spark context.

The log on the executor looks like this near the termination:

{code}
I1215 02:12:31.190878 130732 process.cpp:566] Parsed message name 
'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.30:58604 
from slave(1)@172.19.67.30:5051
I1215 02:12:31.190928 130732 process.cpp:2382] Spawned process 
__http__(4)@172.19.67.30:58604
I1215 02:12:31.190932 130721 process.cpp:2392] Resuming 
executor(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.190924800+00:00
I1215 02:12:31.190958 130702 process.cpp:2392] Resuming 
__http__(4)@172.19.67.30:58604 at 2015-12-15 02:12:31.190951936+00:00
I1215 02:12:31.190976 130721 exec.cpp:381] Executor asked to shutdown
I1215 02:12:31.190943 130727 process.cpp:2392] Resuming 
__gc__@172.19.67.30:58604 at 2015-12-15 02:12:31.190937088+00:00
I1215 02:12:31.190991 130702 process.cpp:2497] Cleaning up 
__http__(4)@172.19.67.30:58604
I1215 02:12:31.191032 130721 process.cpp:2382] Spawned process 
(2)@172.19.67.30:58604
I1215 02:12:31.191040 130702 process.cpp:2392] Resuming (2)@172.19.67.30:58604 
at 2015-12-15 02:12:31.191037952+00:00
I1215 02:12:31.191054 130702 exec.cpp:80] Scheduling shutdown of the executor
I1215 02:12:31.191069 130721 exec.cpp:396] Executor::shutdown took 21572ns
I1215 02:12:31.191073 130702 clock.cpp:260] Created a timer for 
(2)@172.19.67.30:58604 in 5secs in the future (2015-12-15 
02:12:36.191062016+00:00)
I1215 02:12:31.191066 130720 process.cpp:2392] Resuming (1)@172.19.67.30:58604 
at 2015-12-15 02:12:31.191059200+00:00
15/12/15 02:12:31 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
I1215 02:12:31.240103 130732 clock.cpp:151] Handling timers up to 2015-12-15 
02:12:31.240091136+00:00
I1215 02:12:31.240123 130732 clock.cpp:158] Have timeout(s) at 2015-12-15 
02:12:31.240036096+00:00
I1215 02:12:31.240183 130730 process.cpp:2392] Resuming 
reaper(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.240178176+00:00
I1215 02:12:31.240226 130730 clock.cpp:260] Created a timer for 
reaper(1)@172.19.67.30:58604 in 100ms in the future (2015-12-15 
02:12:31.340212992+00:00)
I1215 02:12:31.247019 130720 clock.cpp:260] Created a timer for 
(1)@172.19.67.30:58604 in 3secs in the future (2015-12-15 
02:12:34.247005952+00:00)
15/12/15 02:12:31 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: 
SIGTERM
15/12/15 02:12:31 INFO ShutdownHookManager: Shutdown hook called

no more java logs
{code}

If the shuffle fraction is NOT set to 0.0, and the data is allowed to stay in 
memory, then the following log can be seen at termination instead:
{code}
I1215 01:19:16.247705 120052 process.cpp:566] Parsed message name 
'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.24:60016 
from slave(1)@172.19.67.24:5051
I1215 01:19:16.247745 120052 process.cpp:2382] Spawned process 
__http__(4)@172.19.67.24:60016
I1215 01:19:16.247747 120034 process.cpp:2392] Resuming 
executor(1)@172.19.67.24:60016 at 2015-12-15 01:19:16.247741952+00:00
I1215 01:19:16.247758 120030 process.cpp:2392] Resuming 
__gc__@172.19.67.24:60016 at 2015-12-15 01:19:16.247755008+00:00
I1215 01:19:16.247772 120034 exec.cpp:381] Executor asked to shutdown
I1215 01:19:16.247772 120038 process.cpp:2392] Resuming 
__http__(4)@172.19.67.24:60016 at 2015-12-15 01:19:16.247767808+00:00
I1215 01:19:16.247791 120038 process.cpp:2497]

[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6

2015-12-14 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055985#comment-15055985
 ] 

Sean Owen commented on SPARK-12319:
---

Do you have any more detail here -- what specifically is the test failure and 
fix?
You're referring to bit twiddling ops in BitSetMethods, but these operators 
don't have an endian-ness.

> Address endian specific problems surfaced in 1.6
> 
>
> Key: SPARK-12319
> URL: https://issues.apache.org/jira/browse/SPARK-12319
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: BE platforms
>Reporter: Adam Roberts
>Priority: Critical
>
> JIRA to cover endian specific problems - since testing 1.6 I've noticed 
> problems with DataFrames on BE platforms, e.g. 
> https://issues.apache.org/jira/browse/SPARK-9858
> [~joshrosen] [~yhuai]
> Current progress: using com.google.common.io.LittleEndianDataInputStream and 
> com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer 
> fixes three test failures in ExchangeCoordinatorSuite but I'm concerned 
> around performance/wider functional implications
> "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input 
> with reordering" fails as we expect "one, 1" but instead get "one, 9" - we 
> believe the issue lies within BitSetMethods.java, specifically around: return 
> (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12321) JSON format for logical/physical execution plans

2015-12-14 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-12321:
---

 Summary: JSON format for logical/physical execution plans
 Key: SPARK-12321
 URL: https://issues.apache.org/jira/browse/SPARK-12321
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6

2015-12-14 Thread Adam Roberts (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056040#comment-15056040
 ] 

Adam Roberts commented on SPARK-12319:
--

Hi Sean, here are the failures

ExchangeCoordinatorSuite:
- test estimatePartitionStartIndices - 1 Exchange
- test estimatePartitionStartIndices - 2 Exchanges
- test estimatePartitionStartIndices and enforce minimal number of reducers
- determining the number of reducers: aggregate 
operator(minNumPostShufflePartitions: 3)
- determining the number of reducers: join 
operator(minNumPostShufflePartitions: 3)
- determining the number of reducers: complex query 
1(minNumPostShufflePartitions: 3)

- determining the number of reducers: complex query 
2(minNumPostShufflePartitions: 3)
- determining the number of reducers: aggregate operator *** FAILED ***
  3 did not equal 2 (ExchangeCoordinatorSuite.scala:315)
- determining the number of reducers: join operator *** FAILED ***
  1 did not equal 2 (ExchangeCoordinatorSuite.scala:366)
- determining the number of reducers: complex query 1
- determining the number of reducers: complex query 2 *** FAILED ***
  Set(2) did not equal Set(2, 3) (ExchangeCoordinatorSuite.scala:472)

The fix is to replace the use of DataInput/OutputStreams with 
LittleEndianDataInput/OutputStream objects in order to have these tests pass on 
big endian platforms

With regards to the Dataset failure (using DF behind the scenes and also using 
the tungsten optimised agg function), here's a snippet of the failing test 
output

  == Physical Plan ==
  TungstenAggregate(key=[value#1148], 
functions=[(ClassInputAgg$(b#1050,a#1051),mode=Final,isDistinct=false)], 
output=[value#1148,ClassInputAgg$(b,a)#1162])
   TungstenExchange (HashPartitioning 5), None
TungstenAggregate(key=[value#1148], 
functions=[(ClassInputAgg$(b#1050,a#1051),mode=Partial,isDistinct=false)], 
output=[value#1148,value#1158])
 !AppendColumns , class[a[0]: int, b[0]: string], 
class[value[0]: string], [value#1148]
  Project [one AS b#1050,1 AS a#1051]
   Scan OneRowRelation[]
  == Results ==
  !== Correct Answer - 1 ==   == Spark Answer - 1 ==
  ![one,1][one,9] (QueryTest.scala:127)

This is for the third checkAnswer call in the reordering test:

checkAnswer(
  ds.groupBy(_.b).agg(ClassInputAgg.toColumn),
  ("one", 1))

If we change our sql statement from 

val ds = sql("SELECT 'one' AS b, 1 as a").as[AggData]

so that a is, say, 2, we get 10. With 3, we get 11, etc.

> Address endian specific problems surfaced in 1.6
> 
>
> Key: SPARK-12319
> URL: https://issues.apache.org/jira/browse/SPARK-12319
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: BE platforms
>Reporter: Adam Roberts
>Priority: Critical
>
> JIRA to cover endian specific problems - since testing 1.6 I've noticed 
> problems with DataFrames on BE platforms, e.g. 
> https://issues.apache.org/jira/browse/SPARK-9858
> [~joshrosen] [~yhuai]
> Current progress: using com.google.common.io.LittleEndianDataInputStream and 
> com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer 
> fixes three test failures in ExchangeCoordinatorSuite but I'm concerned 
> around performance/wider functional implications
> "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input 
> with reordering" fails as we expect "one, 1" but instead get "one, 9" - we 
> believe the issue lies within BitSetMethods.java, specifically around: return 
> (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4816.
--
Resolution: Fixed

I'm re-resolving this since I've tested this exact branch and master recently 
and observe the right native libs in the right place in the assembly.

> Maven profile netlib-lgpl does not work
> ---
>
> Key: SPARK-4816
> URL: https://issues.apache.org/jira/browse/SPARK-4816
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
> Environment: maven 3.0.5 / Ubuntu
>Reporter: Guillaume Pitel
>Priority: Minor
> Fix For: 1.1.1
>
>
> When doing what the documentation recommends to recompile Spark with Netlib 
> Native system binding (i.e. to bind with openblas or, in my case, MKL), 
> mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests 
> clean package
> The resulting assembly jar still lacked the netlib-system class. (I checked 
> the content of spark-assembly...jar)
> When forcing the netlib-lgpl profile in MLLib package to be active, the jar 
> is correctly built.
> So I guess it's a problem with the way maven passes profiles activitations to 
> children modules.
> Also, despite the documentation claiming that if the job's jar contains 
> netlib with necessary bindings, it should works, it does not. The classloader 
> must be unhappy with two occurrences of netlib ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-14 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055073#comment-15055073
 ] 

Sean Owen edited comment on SPARK-12311 at 12/14/15 1:45 PM:
-

LGTM, feel free to make a pull request.
EDIT: I knew I was missing something. Josh is right that there's a mixin trait 
that should fix this. Are you sure it's actually not working?


was (Author: srowen):
LGTM, feel free to make a pull request.

> [CORE] Restore previous value of "os.arch" property in test suites after 
> forcing to set specific value to "os.arch" property
> 
>
> Key: SPARK-12311
> URL: https://issues.apache.org/jira/browse/SPARK-12311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
> specific value (e.g. "amd64") into system property "os.arch", they do not 
> restore the original value of "os.arch" after these test suites. This may 
> lead to failures in a test case that depends on architecture on other 
> platform rather than amd64.
> They should save the original value of "os.arch" and restore this at the end 
> of these test suites.
>  
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9578) Stemmer feature transformer

2015-12-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055979#comment-15055979
 ] 

Apache Spark commented on SPARK-9578:
-

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/10272

> Stemmer feature transformer
> ---
>
> Key: SPARK-9578
> URL: https://issues.apache.org/jira/browse/SPARK-9578
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Transformer mentioned first in [SPARK-5571] based on suggestion from 
> [~aloknsingh].  Very standard NLP preprocessing task.
> From [~aloknsingh]:
> {quote}
> We have one scala stemmer in scalanlp%chalk 
> https://github.com/scalanlp/chalk/tree/master/src/main/scala/chalk/text/analyze
>   which can easily copied (as it is apache project) and is in scala too.
> I think this will be better alternative than lucene englishAnalyzer or 
> opennlp.
> Note: we already use the scalanlp%breeze via the maven dependency so I think 
> adding scalanlp%chalk dependency is also the options. But as you had said we 
> can copy the code as it is small.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12275) No plan for BroadcastHint in some condition

2015-12-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12275:
--
Labels: backport-needed  (was: )

> No plan for BroadcastHint in some condition
> ---
>
> Key: SPARK-12275
> URL: https://issues.apache.org/jira/browse/SPARK-12275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: yucai
>Assignee: yucai
>  Labels: backport-needed
> Fix For: 1.6.1, 2.0.0
>
>
> *Summary*
> No plan for BroadcastHint is generated in some condition.
> *Test Case*
> {code}
> val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value")
> val parquetTempFile =
>   "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), 
> scala.util.Random.nextInt)
> df1.write.parquet(parquetTempFile)
> val pf1 = sqlContext.read.parquet(parquetTempFile)
> #1. df1.join(broadcast(pf1)).count()
> #2. broadcast(pf1).count()
> {code}
> *Result*
> It will trigger assertion in QueryPlanner.scala, like below:
> {code}
> scala> df1.join(broadcast(pf1)).count()
> java.lang.AssertionError: assertion failed: No plan for BroadcastHint
> +- Relation[key#6,value#7] 
> ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet]
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12321) JSON format for logical/physical execution plans

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12321:


Assignee: Apache Spark

> JSON format for logical/physical execution plans
> 
>
> Key: SPARK-12321
> URL: https://issues.apache.org/jira/browse/SPARK-12321
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12321) JSON format for logical/physical execution plans

2015-12-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056048#comment-15056048
 ] 

Apache Spark commented on SPARK-12321:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/10295

> JSON format for logical/physical execution plans
> 
>
> Key: SPARK-12321
> URL: https://issues.apache.org/jira/browse/SPARK-12321
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12321) JSON format for logical/physical execution plans

2015-12-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12321:


Assignee: (was: Apache Spark)

> JSON format for logical/physical execution plans
> 
>
> Key: SPARK-12321
> URL: https://issues.apache.org/jira/browse/SPARK-12321
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work

2015-12-14 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056348#comment-15056348
 ] 

Sean Owen commented on SPARK-4816:
--

I tried your exact build above and I get lots of hits for netlib-native:

{code}
$ zipinfo -1 assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.4.0.jar | 
grep netlib-native
netlib-native_ref-osx-x86_64.jnilib
netlib-native_ref-osx-x86_64.jnilib.asc
netlib-native_ref-osx-x86_64.pom
netlib-native_ref-osx-x86_64.pom.asc
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.properties
netlib-native_ref-linux-x86_64.so
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.xml
META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.properties
netlib-native_ref-linux-i686.so
...
{code}

Still not sure what to make of the difference here, though it seems to work 
here. You're saying that at worst you think it's fixed for 1.4.2 by SPARK-9507. 
In that case I suppose it's already resolved, just not in the 1.4.1 release. 
But as I say it does seem to work for me.

> Maven profile netlib-lgpl does not work
> ---
>
> Key: SPARK-4816
> URL: https://issues.apache.org/jira/browse/SPARK-4816
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
> Environment: maven 3.0.5 / Ubuntu
>Reporter: Guillaume Pitel
>Priority: Minor
> Fix For: 1.1.1
>
>
> When doing what the documentation recommends to recompile Spark with Netlib 
> Native system binding (i.e. to bind with openblas or, in my case, MKL), 
> mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests 
> clean package
> The resulting assembly jar still lacked the netlib-system class. (I checked 
> the content of spark-assembly...jar)
> When forcing the netlib-lgpl profile in MLLib package to be active, the jar 
> is correctly built.
> So I guess it's a problem with the way maven passes profiles activitations to 
> children modules.
> Also, despite the documentation claiming that if the job's jar contains 
> netlib with necessary bindings, it should works, it does not. The classloader 
> must be unhappy with two occurrences of netlib ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12219) Spark 1.5.2 code does not build on Scala 2.11.7 with SBT assembly

2015-12-14 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056339#comment-15056339
 ] 

Sean Owen commented on SPARK-12219:
---

I tried building branch 1.5 and I do see some of the same messages, however, 
they're warnings in the Maven build at least. I do see they're errors in SBT. 
I'm not sure the difference is intended, but at least, technically speaking the 
official 1.5 build works with 2.11. Let's have a look at the state of 
master/1.6 here. It's still a good idea to zap warnings like this if we can.

> Spark 1.5.2 code does not build on Scala 2.11.7 with SBT assembly
> -
>
> Key: SPARK-12219
> URL: https://issues.apache.org/jira/browse/SPARK-12219
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.2
>Reporter: Rodrigo Boavida
>
> I've tried with no success to build Spark on Scala 2.11.7. I'm getting build 
> errors using sbt due to the issues found in the below thread in July of this 
> year.
> https://mail-archives.apache.org/mod_mbox/spark-dev/201507.mbox/%3CCA+3qhFSJGmZToGmBU1=ivy7kr6eb7k8t6dpz+ibkstihryw...@mail.gmail.com%3E
> Seems some minor fixes are needed to make the Scala 2.11 compiler happy.
> I needed to build with SBT as per suggested on below thread to get over some 
> apparent maven shader plugin because which changed some classes when I change 
> to akka 2.4.0.
> https://groups.google.com/forum/#!topic/akka-user/iai6whR6-xU
> I've set this bug to Major priority assuming that the Spark community wants 
> to keep fully supporting SBT builds, including the Scala 2.11 compatibility.
> Tnks,
> Rod



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12323) Don't assign default value for non-nullable columns of a Dataset

2015-12-14 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-12323:
--

 Summary: Don't assign default value for non-nullable columns of a 
Dataset
 Key: SPARK-12323
 URL: https://issues.apache.org/jira/browse/SPARK-12323
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0, 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


For a field of a Dataset, if it's specified as non-nullable in the schema of 
the Dataset, we shouldn't assign default value for it if input data contain 
null. Instead, a runtime exception with nice error message should be thrown, 
and ask the user to use {{Option}} or nullable types (e.g., 
{{java.lang.Integer}} instead of {{scala.Int}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056317#comment-15056317
 ] 

Kazuaki Ishizaki commented on SPARK-12311:
--

Yes. Even when I run core test on a ppc64le machine, I got "amd64" from 
System.getProperty("os.arch") in another test suite (e.g. 
StorageStatusListnerSuite that is lately executed).
Although this mismatch does not seem to cause any test failure now, this would 
lead to a test failure in the future.

> [CORE] Restore previous value of "os.arch" property in test suites after 
> forcing to set specific value to "os.arch" property
> 
>
> Key: SPARK-12311
> URL: https://issues.apache.org/jira/browse/SPARK-12311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
> specific value (e.g. "amd64") into system property "os.arch", they do not 
> restore the original value of "os.arch" after these test suites. This may 
> lead to failures in a test case that depends on architecture on other 
> platform rather than amd64.
> They should save the original value of "os.arch" and restore this at the end 
> of these test suites.
>  
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 165 matches

Mail list logo