None.get on Redact in DataSourceScanExec

2017-07-13 Thread Russell Spitzer
Sorry if this is a double post, wasn't sure if I got through on my forwarding. I mentioned this in the RC2 note for 2.2.0 of Spark and i'm seeing it now on the official release. Running the Spark Casasnadra Connector integration tests for the SCC now fail whenever trying to do something involving

Fwd: None.get on Redact in DataSourceScanExec

2017-07-13 Thread Russell Spitzer
I mentioned this in the RC2 note for 2.2.0 of Spark and i'm seeing it now on the official release. Running the Spark Casasnadra Connector integration tests for the SCC now fail whenever trying to do something involving the CassandraSource being transformed into the DataSourceScanExec SparkPlan.

Re: Does mapWithState need checkpointing to be specified in Spark Streaming?

2017-07-13 Thread swetha kasireddy
OK. Thanks TD. Does stateSnapshots() bring the snapshot of the state of all the keys managed by mapWithState or does it just bring the state of the keys in the current micro batch? Its kind of conflicting because the following link says that it brings the state only for the keys seen in the

Re: Does mapWithState need checkpointing to be specified in Spark Streaming?

2017-07-13 Thread Tathagata Das
Yes. It does. On that note, Spark 2.2 (released couple of days ago) adds mapGroupsWithState in Structured Streaming. That is like mapWithState on steroids. Just saying. :) On Thu, Jul 13, 2017 at 1:01 PM, SRK wrote: > Hi, > > Do we need to specify checkpointing for

Does mapWithState need checkpointing to be specified in Spark Streaming?

2017-07-13 Thread SRK
Hi, Do we need to specify checkpointing for mapWithState just like we do for updateStateByKey? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-mapWithState-need-checkpointing-to-be-specified-in-Spark-Streaming-tp28858.html Sent from

Re: DataFrameReader read from S3 org.apache.spark.sql.AnalysisException: Path does not exist

2017-07-13 Thread Sumona Routh
Yes, which is what I eventually did. I wanted to check if there was some "mode" type, similar to SaveMode with writers. Appears that there genuinely is no option for this and it has to be handled by the client using the exception flow. Thanks, Sumona On Wed, Jul 12, 2017 at 4:59 PM Yong Zhang

Re: underlying checkpoint

2017-07-13 Thread Bernard Jesop
Thank you, one of my mistakes was to think that show() was an action. 2017-07-13 17:52 GMT+02:00 Vadim Semenov : > You need to trigger an action on that rdd to checkpoint it. > > ``` > scala>spark.sparkContext.setCheckpointDir(".") > > scala>val df =

Re: underlying checkpoint

2017-07-13 Thread Vadim Semenov
You need to trigger an action on that rdd to checkpoint it. ``` scala>spark.sparkContext.setCheckpointDir(".") scala>val df = spark.createDataFrame(List(("Scala", 35), ("Python", 30), ("R", 15), ("Java", 20))) df: org.apache.spark.sql.DataFrame = [_1: string, _2: int] scala>

underlying checkpoint

2017-07-13 Thread Bernard Jesop
Hi everyone, I just tried this simple program : * import org.apache.spark.sql.SparkSession object CheckpointTest extends App { val spark = SparkSession .builder() .appName("Toto") .getOrCreate() spark.sparkContext.setCheckpointDir(".") val df =

Re: Spark 2.1.1: A bug in org.apache.spark.ml.linalg.* when using VectorAssembler.scala

2017-07-13 Thread Yan Facai
Hi, junjie. As Nick said, spark.ml indeed contains Vector, Vectors and VectorUDT by itself, see: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:36: sealed trait Vector extends Serializable So, which bug do you find with VectorAssembler? Could you give more details?

[SQL] Syntax "case when" doesn't be supported in JOIN

2017-07-13 Thread ????
Hi All, I'm trying to execute hive sql on spark sql (Also on spark thriftserver), For optimizing data skew, we use "case when" to handle null. Simple sql as following: SELECT a.col1 FROM tbl1 a LEFT OUTER JOIN tbl2 b ON CASE WHEN a.col2 IS NULL

Re: Spark 2.1.1: A bug in org.apache.spark.ml.linalg.* when using VectorAssembler.scala

2017-07-13 Thread Nick Pentreath
There are Vector classes under ml.linalg package - And VectorAssembler and other feature transformers all work with ml.linalg vectors. If you try to use mllib.linalg vectors instead you will get an error as the user defined type for SQL is not correct On Thu, 13 Jul 2017 at 11:23,

Spark 2.1.1: A bug in org.apache.spark.ml.linalg.* when using VectorAssembler.scala

2017-07-13 Thread xiongjunjie
Dear Developers: Here is a bug in org.apache.spark.ml.linalg.*: Class Vector, Vectors are not included in org.apache.spark.ml.linalg.*, but they are used in VectorAssembler.scala as follows: import org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT} Therefore, bug was reported when I was

how to identify the alive master spark via Zookeeper ?

2017-07-13 Thread marina.brunel
Hello, In our project, we have a Spark cluster with 2 master and 4 workers and Zookeeper decides which master is alive. We have a problem with our reverse proxy to display the Spark Web UI. The RP redirect on a master with IP address configured in initial configuration but if Zookeeper

UnpicklingError while using spark streaming

2017-07-13 Thread lovemoon
| down votefavorite | spark2.1.1 & python2.7.11 I want to union another rdd in Dstream.transform() like below: sc = SparkContext() ssc = StreamingContext(sc, 1) init_rdd = sc.textFile('file:///home/zht/PycharmProjects/test/text_file.txt') lines = ssc.socketTextStream('localhost', )