Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-09 Thread Chetan Khatri
Hello Jayant, Thank you so much for suggestion. My view was to use Python function as transformation which can take couple of column names and return object. which you explained. would that possible to point me to similiar codebase example. Thanks. On Fri, Jul 6, 2018 at 2:56 AM, Jayant

[Structured Streaming] Last processed event time always behind Direct Streaming

2018-07-09 Thread subramgr
Hi We are migrating our Direct Streaming Spark job to Structured Streaming. We have a batch size of 1 minute. I am consistently seeing that the Structured Streaming job is always (3-5 minutes) behind the Direct Streaming job. Is there some kinda fine tuning that will help Structured Streaming

[Structured Streaming] User Define Aggregation Function

2018-07-09 Thread subramgr
Hi I am trying to explore how I can use UDAF for my use case. I have something like this in my Structured Streaming Job. val counts: Dataset[(String, Double)] = events .withWatermark("timestamp", "30 minutes") .groupByKey(e => e._2.siteIdentifier + "|" +

Re: [Structured Streaming] Reading Checkpoint data

2018-07-09 Thread subramgr
thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Structured Streaming] Reading Checkpoint data

2018-07-09 Thread Tathagata Das
Only the stream metadata (e.g., streamid, offsets) are stored as json. The stream state data is stored in an internal binary format. On Mon, Jul 9, 2018 at 4:07 PM, subramgr wrote: > Hi, > > I read somewhere that with Structured Streaming all the checkpoint data is > more readable (Json) like.

[Structured Streaming] Reading Checkpoint data

2018-07-09 Thread subramgr
Hi, I read somewhere that with Structured Streaming all the checkpoint data is more readable (Json) like. Is there any documentation on how to read the checkpoint data. If I do `hadoop fs -ls` on the `state` directory I get some encoded data. Thanks Girish -- Sent from:

Re: Kubernetes security context when submitting job through k8s servers

2018-07-09 Thread Yinan Li
It's still under design review. It's unlikely that it will go into 2.4. On Mon, Jul 9, 2018 at 3:46 PM trung kien wrote: > Thanks Li, > > Inread through the ticket, be able to pass pod YAML file would be amazing. > > Do you have any target date for production or incubator? I really want to >

Re: Kubernetes security context when submitting job through k8s servers

2018-07-09 Thread trung kien
Thanks Li, Inread through the ticket, be able to pass pod YAML file would be amazing. Do you have any target date for production or incubator? I really want to try out this feature. On Mon, Jul 9, 2018 at 4:48 PM Yinan Li wrote: > Spark on k8s currently doesn't support specifying a custom

Pyspark access to scala/java libraries

2018-07-09 Thread Mohit Jaggi
Folks, I am writing some Scala/Java code and want it to be usable from pyspark. For example: class MyStuff(addend: Int) { def myMapFunction(x: Int) = x + addend } I want to call it from pyspark as: df = ... mystuff = sc._jvm.MyStuff(5) df[‘x’].map(lambda x: mystuff.myMapFunction(x))

Re: Kubernetes security context when submitting job through k8s servers

2018-07-09 Thread Yinan Li
Spark on k8s currently doesn't support specifying a custom SecurityContext of the driver/executor pods. This will be supported by the solution to https://issues.apache.org/jira/browse/SPARK-24434. On Mon, Jul 9, 2018 at 2:06 PM trung kien wrote: > Dear all, > > Is there any way to includes

Kubernetes security context when submitting job through k8s servers

2018-07-09 Thread trung kien
Dear all, Is there any way to includes security context ( https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) when submitting job through k8s servers? I'm trying to first spark jobs on Kubernetes through spark-submit: bin/spark-submit --master k8s://https://API_SERVERS

Re: Dataframe joins - UnsupportedOperationException: Unimplemented type: IntegerType

2018-07-09 Thread Vadim Semenov
That usually happens when you have different types for a column in some parquet files. In this case, I think you have a column of `Long` type that got a file with `Integer` type, I had to deal with similar problem once. You would have to cast it yourself to Long. On Mon, Jul 9, 2018 at 2:53 PM

Re: Dynamic allocation not releasing executors after unpersisting all cached data

2018-07-09 Thread Vadim Semenov
Try doing `unpersist(blocking=true)` On Mon, Jul 9, 2018 at 2:59 PM Jeffrey Charles wrote: > > I'm persisting a dataframe in Zeppelin which has dynamic allocation enabled > to get a sense of how much memory the dataframe takes up. After I note the > size, I unpersist the dataframe. For some

Dynamic allocation not releasing executors after unpersisting all cached data

2018-07-09 Thread Jeffrey Charles
I'm persisting a dataframe in Zeppelin which has dynamic allocation enabled to get a sense of how much memory the dataframe takes up. After I note the size, I unpersist the dataframe. For some reason, Yarn is not releasing the executors that were added to Zeppelin. If I don't run the persist and

Dataframe joins - UnsupportedOperationException: Unimplemented type: IntegerType

2018-07-09 Thread Nirav Patel
I am getting following error after performing joins between 2 dataframe. It happens on call to .show() method. I assume it's an issue with incompatible type but it's been really hard to identify which column of which dataframe have that incompatibility. Any pointers? 11:06:10.304 13700 [Executor

Spark on Mesos - Weird behavior

2018-07-09 Thread Thodoris Zois
Hello list, We are running Apache Spark on a Mesos cluster and we face a weird behavior of executors. When we submit an app with e.g 10 cores and 2GB of memory and max cores 30, we expect to see 3 executors running on the cluster. However, sometimes there are only 2... Spark applications are

[REST API] Rest API unusable due to application id changing

2018-07-09 Thread bsikander
Spark gives a nice rest api to get metrics https://spark.apache.org/docs/latest/monitoring.html#rest-api The problem is that this API is based on application id, which can change if we are running in supervise mode. Any application which is created based on the rest-api has to deal with changing

Register now for ApacheCon and save $250

2018-07-09 Thread Rich Bowen
Greetings, Apache software enthusiasts! (You’re getting this because you’re on one or more dev@ or users@ lists for some Apache Software Foundation project.) ApacheCon North America, in Montreal, is now just 80 days away, and early bird prices end in just two weeks - on July 21. Prices will

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-09 Thread kant kodali
@yohann Looks like something is wrong with my environment which I am yet to figure out but the theory so far makes sense and I had also tried it in another environments with very minimal configuration like my environment and it works fine so clearly something is wrong with my env I don't know why

Re: Strange behavior of Spark Masters during rolling update

2018-07-09 Thread bsikander
Anyone? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Automatic Json Schema inference using Structured Streaming

2018-07-09 Thread chandan prakash
Hi Swetha, I also had the same requirement reading from json from kafka and writing back to parquet format. I did a work around : 1. Inferred the schema using the batch api by reading first few rows 2. started streaming using the inferred schema in step1 *Limitation*: Will not work if you

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-09 Thread chandan prakash
Thanks Amiya/TD for responding. @TD, Thanks for letting us know about this new foreachBatch api, this handle of per batch dataframe should be useful in many cases. @Amiya, The input source will be read twice, entire dag computation will be done twice. Not limitation but resource utilisation and