Re: Spark Thriftserver is failing for when submitting command from beeline

2021-08-20 Thread Artemis User
Looks like your problem is related to not setting up a hive.xml file properly.  The standard Spark distribution doesn't include a hive.xml template file in the conf directory.  You will have to create one by yourself.  Please refer to the Spark user doc and Hive metastore config guide for

Re: Is memory-only no-disk Spark possible?

2021-08-20 Thread Mich Talebzadeh
Thanks Sounds like you experimented on prem with HDFS and Spark using the same host nodes with data affinity. I am not sure it is something I can sell in a banking environment so to speak. Bottom line it will boil down to procuring more tin boxes on -prem to give spark more memory, assuming that

How can I use sparkContext.addFile

2021-08-20 Thread igyu
in spark-shell I can run val url = "hdfs://nameservice1/user/jztwk/config.json" Spark.sparkContext.addFile(url) val json_str = readLocalFile(SparkFiles.get(url.split("/").last)) but when I make jar package spark-submit --master yarn --deploy-mode cluster --principal jztwk/had...@join.com

Re: Is memory-only no-disk Spark possible?

2021-08-20 Thread Bobby Evans
Yes, this is very much "use at your own risk". That said at Yahoo we did something very similar to this on all of the YARN nodes and saw a decent performance uplift. This was even with HDFS running on the same nodes. I think we just changed the time to flush to 30 mins, but it was a long time

Re: Is memory-only no-disk Spark possible?

2021-08-20 Thread Mich Talebzadeh
Hi Bobby, On this statement of yours if I may: ... If you really want to you can configure the pagecache to not spill to disk until absolutely necessary. That should get you really close to pure in-memory processing, so long as you have enough free memory on the host to support it. I would not

Re: Is memory-only no-disk Spark possible?

2021-08-20 Thread Bobby Evans
On the data path, Spark will write to a local disk when it runs out of memory and needs to spill or when doing a shuffle with the default shuffle implementation. The spilling is a good thing because it lets you process data that is too large to fit in memory. It is not great because the

Spark Thriftserver is failing for when submitting command from beeline

2021-08-20 Thread Pralabh Kumar
Hi Dev Environment details Hadoop 3.2 Hive 3.1 Spark 3.0.3 Cluster : Kerborized . 1) Hive server is running fine 2) Spark sql , sparkshell, spark submit everything is working as expected. 3) Connecting Hive through beeline is working fine (after kinit) beeline -u

Re: Is memory-only no-disk Spark possible?

2021-08-20 Thread Mich Talebzadeh
Well I don't know what having an "in-memory Spark only" is going to achieve. Spark GUI shows the amount of disk usage pretty well. The memory is used exclusively by default first. Spark is no different from a predominantly in-memory application. Effectively it is doing the classical disk based

Re: Is memory-only no-disk Spark possible? [Marketing Mail]

2021-08-20 Thread Jack Kolokasis
Hello Jacek, On 20/8/21 2:49 μ.μ., Jacek Laskowski wrote: Hi, I've been exploring BlockManager and the stores for a while now and am tempted to say that a memory-only Spark setup would be possible (except shuffle blocks). Is this correct? Correct. What about shuffle blocks? Do they have

Is memory-only no-disk Spark possible?

2021-08-20 Thread Jacek Laskowski
Hi, I've been exploring BlockManager and the stores for a while now and am tempted to say that a memory-only Spark setup would be possible (except shuffle blocks). Is this correct? What about shuffle blocks? Do they have to be stored on disk (in DiskStore)? I think broadcast variables are