Hi, I have used Hive on Spark engine and of course Hive tables and its pretty impressive comparing Hive using MR engine.
Let us assume that I use spark shell. Spark shell is a client that connects to spark master running on a host and port like below spark-shell --master spark://50.140.197.217:7077: Ok once I connect I create an RDD to read a text file: val oralog = sc.textFile("/test/alert_mydb.log") I then search for word Errors in that file oralog.filter(line => line.contains("Errors")).collect().foreach(line => println(line)) Questions: 1. In order to display the lines (the result set) containing word "Errors", the content of the file (i.e. the blocks on HDFS) need to be read into memory. Is my understanding correct that as per RDD notes those blocks from the file will be partitioned across the cluster and each node will have its share of blocks in memory? 2. Once the result is returned back they need to be sent to the client that has made the connection to master. I guess this is a simple TCP operation much like any relational database sending the result back? 3. Once the results are returned if no request has been made to keep the data in memory, those blocks in memory will be discarded? 4. Regardless of the storage block size on disk (128MB, 256MB etc), the memory pages are 2K in relational databases? Is this the case in Spark as well? Thanks, Mich Talebzadeh LinkedIn <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU rV8Pw> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr V8Pw <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.