Re: Spark on Kubernetes: Kubernetes killing executors because of overallocation of memory

2018-08-02 Thread Matt Cheah
Hi there, You may want to look at setting the memory overhead settings higher. Spark will then start containers with a higher memory limit (spark.executor.memory + spark.executor.memoryOverhead, to be exact) while the heap is still locked to spark.executor.memory. There’s some memory used

Insert into dynamic partitioned hive/parquet table throws error - Partition spec contains non-partition columns

2018-08-02 Thread Nirav Patel
I am trying to insert overwrite multiple partitions into existing partitioned hive/parquet table. Table was created using sparkSession. I have a table 'mytable' with partitions P1 and P2. I have following set on sparkSession object: .config("hive.exec.dynamic.partition", true)

Re: [External Sender] re: streaming, batch / spark 2.2.1

2018-08-02 Thread Peter Liu
thanks for the clarification. the processing time on both systems seems to be fine :(a) based on the pattern of batch processing time chart, i.e the batch processing time is not becoming longer and longer (see charts attached below); (b) the input data on each spark stage of every batch remains

Re: re: streaming, batch / spark 2.2.1

2018-08-02 Thread zakhavan
Yes, I am loading a text file from my local machine into a kafka topic using the script below and I'd like to calculate the number of samples per second which is used by kafka consumer. if __name__ == "__main__": print("hello spark") sc = SparkContext(appName="STALTA") ssc =

Re: [External Sender] re: streaming, batch / spark 2.2.1

2018-08-02 Thread Jayesh Lalwani
What is differrent between the 2 systems? If one system processes records faster than the other, simply because it does less processing, then you can expect the first system to have a higher throughput than the second. It's hard to say why one system has double the throughput of another without

Re: re: streaming, batch / spark 2.2.1

2018-08-02 Thread zakhavan
Hello, I just had a question. Could you refer me to a link or tell me how you calculated these logs such as: *300K msg/sec to a kafka broker, 220bytes per message * I'm load a text file with 36000 records into a kafka topic and I'd like to calculate the data rate (#samples per sec) in kafka.

Spark on Kubernetes: Kubernetes killing executors because of overallocation of memory

2018-08-02 Thread Jayesh Lalwani
We are running Spark 2.3 on a Kubernetes cluster. We have set the following spark configuration options "spark.executor.memory": "7g", "spark.driver.memory": "2g", "spark.memory.fraction": "0.75" WHat we see is a) In the SPark UI, 5G has been allocated to each executor, which makes sense

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

2018-08-02 Thread Nirav Patel
I tried following to explicitly specify partition columns in sql statement and also tried different cases (upper and lower) fro partition columns. insert overwrite table $tableName PARTITION(P1, P2) select A, B, C, P1, P2 from updateTable. Still getting: Caused by:

re: streaming, batch / spark 2.2.1

2018-08-02 Thread Peter Liu
Hello there, I'm new to spark streaming and have trouble to understand spark batch "composition" (google search keeps give me an older spark streaming concept). Would appreciate any help and clarifications. I'm using spark 2.2.1 for a streaming workload (see quoted code in (a) below). The

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

2018-08-02 Thread Nirav Patel
Thanks Koert. I'll check that out when we can update to 2.3 Meanwhile, I am trying hive sql (INSERT OVERWRITE) statement to insert overwrite multiple partitions. (without loosing existing ones) It's giving me issues around partition columns. dataFrame.createOrReplaceTempView("updateTable")

Re: Can we deploy python script on a spark cluster

2018-08-02 Thread amit kumar singh
Hi Lehak You can make a scala project with oozing class And one run class which will ship your python file to cluster Define oozie coordinator with spark action or shell action We are deploying pyspark based machine learning code Sent from my iPhone > On Aug 2, 2018, at 8:46 AM, Lehak

Can we deploy python script on a spark cluster

2018-08-02 Thread Lehak Dharmani
We are trying to deploy python script on spark cluster . However as per documentations , it is not possible to deploy python applications on a cluster . Is there any alternative -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

unsubscribe

2018-08-02 Thread Eco Super
Hi User, unsubscribe me