Re: Catalog, SessionCatalog and ExternalCatalog in spark 2.0

2016-09-03 Thread Kapil Malik
Thanks Raghavendra :) Will look into Analyzer as well. Kapil Malik *Sr. Principal Engineer | Data Platform, Technology* M: +91 8800836581 | T: 0124-433 | EXT: 20910 ASF Centre A | 1st Floor | Udyog Vihar Phase IV | Gurgaon | Haryana | India *Disclaimer:* This communication is for the sole

Catalog, SessionCatalog and ExternalCatalog in spark 2.0

2016-09-03 Thread Kapil Malik
ntly it looks like I need to extend SessionCatalog only. However, just wanted to get a feedback on if there's a better / recommended approach to achieve this. Thanks and regards, Kapil Malik *Sr. Principal Engineer | Data Platform, Technology* M: +91 8800836581 | T: 0124-433 | EXT: 2091

Design query regarding dataframe usecase

2016-01-11 Thread Kapil Malik
Hi, We have an analytics usecase where we are collecting user click logs. The data can be considered as hierarchical with 3 type of logs - User (attributes like userId, emailId) - Session (attributes like sessionId, device, OS, browser, city etc.) - - PageView (attributes like url, referrer,

RE: Problem getting program to run on 15TB input

2015-06-06 Thread Kapil Malik
Very interesting and relevant thread for production level usage of spark. @Arun, can you kindly confirm if Daniel’s suggestion helped your usecase? Thanks, Kapil Malik | kma...@adobe.commailto:kma...@adobe.com | 33430 / 8800836581 From: Daniel Mahler [mailto:dmah...@gmail.com] Sent: 13 April

RE: Passing around SparkContext with in the Driver

2015-03-04 Thread Kapil Malik
Replace val sqlContext = new SQLContext(sparkContext) with @transient val sqlContext = new SQLContext(sparkContext) -Original Message- From: kpeng1 [mailto:kpe...@gmail.com] Sent: 04 March 2015 23:39 To: user@spark.apache.org Subject: Passing around SparkContext with in the Driver Hi

RE: does calling cache()/persist() on a RDD trigger its immediate evaluation?

2015-01-04 Thread Kapil Malik
Hi Pengcheng YIN, RDD cache / persist calls do not trigger evaluation. Unpersist call is blocking (it does have an async flavor but am not sure what are the SLAs on behavior). val rdd = sc.textFile().map() rdd.persist() // This does not trigger actual storage while(true){ val count =

RE: FlatMapValues

2014-12-31 Thread Kapil Malik
Hi Sanjay, I tried running your code on spark shell piece by piece – // Setup val line1 = “025126,Chills,8.10,Injection site oedema,8.10,Injection site reaction,8.10,Malaise,8.10,Myalgia,8.10” val line2 = “025127,Chills,8.10,Injection site oedema,8.10,Injection site

RE: FlatMapValues

2014-12-31 Thread Kapil Malik
Hi Sanjay, Oh yes .. on flatMapValues, it's defined in PairRDDFunctions, and you need to import org.apache.spark.rdd.SparkContext._ to use them (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions ) @Sean, yes indeed flatMap / flatMapValues both can

RE: Fwd: Sample Spark Program Error

2014-12-31 Thread Kapil Malik
Hi Naveen, Quoting http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext SparkContext is Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables

RE: Determination of number of RDDs

2014-12-04 Thread Kapil Malik
Regarding: Can we create such an array and then parallelize it? Parallelizing an array of RDDs - i.e. RDD[RDD[x]] is not possible. RDD is not serializable. From: Deep Pradhan [mailto:pradhandeep1...@gmail.com] Sent: 04 December 2014 15:39 To: user@spark.apache.org Subject: Determination of

RE: Snappy error with Spark SQL

2014-11-12 Thread Kapil Malik
/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64 export SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar Pointing to eqv. snappy / MR directory on your box. Thanks, Kapil Malik From: Naveen Kumar Pokala [mailto:npok...@spcapitaliq.com] Sent: 12 November

RE: Help with processing multiple RDDs

2014-11-11 Thread Kapil Malik
Hi, How is 78g distributed in driver, daemon, executor ? Can you please paste the logs regarding that I don't have enough memory to hold the data in memory Are you collecting any data in driver ? Lastly, did you try doing a re-partition to create smaller and evenly distributed partitions?

RE: unsubscribe

2014-03-11 Thread Kapil Malik
Ohh ! I thought you're unsubscribing :) Kapil Malik | kma...@adobe.com | 33430 / 8800836581 -Original Message- From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: 12 March 2014 00:51 To: user@spark.apache.org Subject: Re: unsubscribe To unsubscribe from this list, please