Information required

2016-12-09 Thread Rishabh Wadhawan
Does anyone know the repository link for the src of GroupID: org.spark-project.hive Artifact: 1.2.1.spark I was able to find https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 which is artifact 1.2.1.spark2 not 1.2.1.spark. -- View this message in context:

Re: StackOverflow in Spark

2016-06-01 Thread Rishabh Wadhawan
Stackoverflow is generated when DAG is too log as there are many transformations in lot of iterations. Please use checkpointing to store the DAG and break the linage to get away from this stack overflow error. Look into checkpoint fuction. Thanks Hope it helps. Let me know if you need anymore

Re: Passing a dataframe to where clause + Spark SQL

2016-02-11 Thread Rishabh Wadhawan
Hi Divya Considering you are able to successfully load both tables testCond and test as data frames. As now taking your case: when you do val condval = testCond.select(“Cond”) //Where Cond is a column name, here condval is a DataFrame.Even if it has one row, it is still a data frame if you want

Re: Scala types to StructType

2016-02-11 Thread Rishabh Wadhawan
I had the same issue. I resolved it in Java, but I am pretty sure it would work with scala too. Its kind of a gross hack. But what I did is say I had a table in Mysql with 1000 columns what is did is that I threw a jdbc query to extracted the schema of the table. I stored that schema and wrote

Re: Dataframes

2016-02-11 Thread Rishabh Wadhawan
n. Please ask me if you have any other question too. Thanks Regards Rishabh Wadhawan > On Feb 11, 2016, at 9:47 AM, Gaurav Agarwal <gaurav130...@gmail.com> wrote: > > SqlContext sContext = new SQlContext(sc) > DataFrame df = sContext.load("jdbc",&q

Re: Spark execuotr Memory profiling

2016-02-11 Thread Rishabh Wadhawan
Hi All Please check this jira ticket regarding the issue. I was having the same issue with shuffling. Seems like the shuffling memory max is 2g. https://issues.apache.org/jira/browse/SPARK-5928 > On Feb 11, 2016, at 9:08 AM,

Re: Low latency queries much slower in 1.6.0

2016-02-03 Thread Rishabh Wadhawan
Hi Younes. When you have multiple user connected to hive, or you have multiple applications trying to access a shared memory. My recommendation would be to store it to a off-heap rather then disk. Checkout this link and check RDD Persistence

Re: Spark 1.5.2 memory error

2016-02-03 Thread Rishabh Wadhawan
As of what I know, Cores won’t give you more portion of executor memory, because its just cpu cores that you are using per executor. Reducing the number of cores however would result in lack of parallel processing power. The executor memory that we specify with spark.executor.memory would be

Re: Spark 1.5.2 memory error

2016-02-03 Thread Rishabh Wadhawan
Hi I suppose you are using —master yarn-client or yarn cluster. Can you try boosting spark.yarn.driver.memoryOverhead, override it to 0.15 * executor memory rather then default 0.1. Check out this link https://spark.apache.org/docs/1.5.2/running-on-yarn.html

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Rishabh Wadhawan
Hi Nirav There is a difference between dynamic resource allocation and shuffle service. The dynamic allocation when you enable the configurations for it, every time you run any task spark will determine the number of executors required to run that task for you, which means decreasing the

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Rishabh Wadhawan
Check out this link http://spark.apache.org/docs/latest/configuration.html and check spark.shuffle.service. Thanks > On Feb 3, 2016, at 1:02 PM, Marcelo Vanzin wrote: > > Yes, but you don't necessarily need to use