Unsubscribe

2022-04-28 Thread Ajay Thompson
Unsubscribe

Log4J 2 Support

2021-11-09 Thread Ajay Kumar
. Thanks in advance. Regards, Ajay

Re: Spark read csv option - capture exception in a column in permissive mode

2019-06-16 Thread Ajay Thompson
he column in the schema > that you are using to read. > > Regards, > Gourav > > On Sun, Jun 16, 2019 at 2:48 PM wrote: > >> Hi Team, >> >> >> >> Can we have another column which gives the corrupted record reason in >> permissive mode while reading csv. >> >> >> >> Thanks, >> >> Ajay >> >

Re: Blockmgr directories intermittently not being cleaned up

2018-05-30 Thread Ajay
ut I believe this only pertains to > standalone mode and we are using the mesos deployment mode. So I don't > think this flag actually does anything. > > > Thanks, > Jeff > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Thanks, Ajay

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

2018-05-23 Thread Ajay
t;> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > -- Thanks, Ajay

Re: Does Spark shows logical or physical plan when executing job on the yarn cluster

2018-05-20 Thread Ajay
You can look at the spark master UI at port 4040. It should tell you all the currently running stages as well as past/future stages. On Sun, May 20, 2018, 12:22 AM giri ar wrote: > Hi, > > > Good Day. > > Could you please let me know whether we can see spark logical or

Re: UDTF registration fails for hiveEnabled SQLContext

2018-05-15 Thread Ajay
-- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Thanks, Ajay

Re: Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-09 Thread Ajay Chander
tion: Can't get Master Kerberos principal for use as renewer sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println) //Getting this error: java.io.IOException: Can't get Master Kerberos principal for use as renewer } } On Mon, Nov 7, 2016

Re: Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-07 Thread Ajay Chander
Did anyone use https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala to interact with secured Hadoop from Spark ? Thanks, Ajay On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander <itsche...@gmail.com> wrote: > > Hi Everyo

Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-07 Thread Ajay Chander
this from quite a while ago. Please let me know if you need more info. Thanks Regards, Ajay

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sean, thank you for making it clear. It was helpful. Regards, Ajay On Wednesday, October 26, 2016, Sean Owen <so...@cloudera.com> wrote: > This usage is fine, because you are only using the HiveContext locally on > the driver. It's applied in a function that's used on a Scal

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sunita, Thanks for your time. In my scenario, based on each attribute from deDF(1 column with just 66 rows), I have to query a Hive table and insert into another table. Thanks, Ajay On Wed, Oct 26, 2016 at 12:21 AM, Sunita Arvind <sunitarv...@gmail.com> wrote: > Ajay, > >

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
(AsynchronousListenerBus.scala:64) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils. scala:1181) at org.apache.spark.util.AsynchronousListenerBus$$anon$ 1.run(AsynchronousListenerBus.scalnerBus.scala:63) Thanks, Ajay On Tue, Oct 25, 2016 at 11:45 PM, Jeff Zhang <zjf...@gmail.com> wrote: >

HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
rk.sql.hive.HiveContext and I see it is extending SqlContext which extends Logging with Serializable. Can anyone tell me if this is the right way to use it ? Thanks for your time. Regards, Ajay

Re: Code review / sqlContext Scope

2016-10-19 Thread Ajay Chander
t.sql("set hive.exec.dynamic.partition.mode=nonstrict") val dataElementsFile = "hdfs://nameservice/user/ajay/spark/flds.txt" //deDF has only 61 rows val deDF = sqlContext.read.text(dataElementsFile).toDF("DataElement").coalesce(1).distinct().cache() deDF.wi

Re: Spark_JDBC_Partitions

2016-09-19 Thread Ajay Chander
zadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Strange that Oracle table of 200Million plus rows has not been >>>>>>> partitioned. >>>>>>> >>>>>>> What matter

Spark_JDBC_Partitions

2016-09-10 Thread Ajay Chander
y pointers are appreciated. Thanks for your time. ~ Ajay

SPARK-8813 - combining small files in spark sql

2016-07-07 Thread Ajay Srivastava
it in spark 2.0 ? I did search commits done in 2.0 branch and looks like I need to use spark.sql.files.openCostInBytes but I am not sure. Regards,Ajay

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Ajay Chander
Thanks for the confirmation Mich! On Wednesday, June 22, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi Ajay, > > I am afraid for now transaction heart beat do not work through Spark, so I > have no other solution. > > This is interesting point as with Hive

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Ajay Chander
. Regards, Ajay On Thursday, June 2, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > thanks for that. > > I will have a look > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >

SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-13 Thread Ajay Chander
providing the table name? Yes I did that too. It did not made any difference. Thank you, Ajay On Sunday, June 12, 2016, Mohit Jaggi <mohitja...@gmail.com> wrote: > Looks like a bug in the code generating the SQL query…why would it be > specific to SAS, I can’t guess. Did you

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-11 Thread Ajay Chander
I tried implementing the same functionality through Scala as well. But no luck so far. Just wondering if anyone here tried using Spark SQL to read SAS dataset? Thank you Regards, Ajay On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Mich, I completely agree wi

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
mmy > SELECT > ID > , CLUSTERED > , SCATTERED > , RANDOMISED > , RANDOM_STRING > , SMALL_VC > , PADDING > FROM tmp > """ >HiveContext.sql(sqltext) > println ("\nFinished at"); sqlCo

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
9:05,935] INFO ps(2.1)#executeQuery SELECT "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) Please find complete program and full logs attached in the below thread. Thank you. Regards, Ajay

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
Hi again, anyone in this group tried to access SAS dataset through Spark SQL ? Thank you Regards, Ajay On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi Spark Users, > > I hope everyone here are doing great. > > I am trying to read data from

SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
---++--+ Since both programs are using the same driver com.sas.rio.MVADriver . Expected output should be same as my pure java programs output. But something else is happening behind the scenes. Any insights on this issue. Thanks for your time. Regards, Ajay Spark Code to

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
extracts > inherently. > But you can maintain a file e.g. extractRange.conf in hdfs , to read from > it the end range and update it with new end range from spark job before it > finishes with the new relevant ranges to be used next time. > > On Tue, Jun 7, 2016 at 8:49 PM, Ajay C

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
t >>> into hdfs >>> >>> perhaps there is some sort of spark 'connectors' that allows you to read >>> data from a db directly so you dont need to go via spk streaming? >>> >>> >>> hth >>> >>> >>> >>> &

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
t allows you to read >> data from a db directly so you dont need to go via spk streaming? >> >> >> hth >> >> >> >> >> >> >> >> >> >> >> On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <itsche...@gmail.com >>

Spark_Usecase

2016-06-07 Thread Ajay Chander
Hi Spark users, Right now we are using spark for everything(loading the data from sqlserver, apply transformations, save it as permanent tables in hive) in our environment. Everything is being done in one spark application. The only thing we do before we launch our spark application through

Re: how to get file name of record being reading in spark

2016-05-31 Thread Ajay Chander
Hi Vikash, These are my thoughts, read the input directory using wholeTextFiles() which would give a paired RDD with key as file name and value as file content. Then you can apply a map function to read each line and append key to the content. Thank you, Aj On Tuesday, May 31, 2016, Vikash

Re: Spark_API_Copy_From_Edgenode

2016-05-28 Thread Ajay Chander
Hi Everyone, Any insights on this thread? Thank you. On Friday, May 27, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi Everyone, > >I have some data located on the EdgeNode. Right > now, the process I follow to copy the data from Edgenode

Spark_API_Copy_From_Edgenode

2016-05-27 Thread Ajay Chander
Hi Everyone, I have some data located on the EdgeNode. Right now, the process I follow to copy the data from Edgenode to HDFS is through a shellscript which resides on Edgenode. In Oozie I am using a SSH action to execute the shell script on Edgenode which copies the

Re: Hive_context

2016-05-24 Thread Ajay Chander
t; This way we can narrow down where the issue is ? > > > Sent from my iPhone > > On May 23, 2016, at 5:26 PM, Ajay Chander <itsche...@gmail.com > <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote: > > I downloaded the spark 1.5 untilities and exported

Re: Hive_context

2016-05-23 Thread Ajay Chander
gards, Aj On Monday, May 23, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi Everyone, > > I am building a Java Spark application in eclipse IDE. From my application > I want to use hiveContext to read tables from the remote Hive(Hadoop > cluster). On my machine I hav

Hive_context

2016-05-23 Thread Ajay Chander
Hi Everyone, I am building a Java Spark application in eclipse IDE. From my application I want to use hiveContext to read tables from the remote Hive(Hadoop cluster). On my machine I have exported $HADOOP_CONF_DIR = {$HOME}/hadoop/conf/. This path has all the remote cluster conf details like

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Never mind! I figured it out by saving it as hadoopfile and passing the codec to it. Thank you! On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi, I have a folder temp1 in hdfs which have multiple format files > test1.txt, test2.avsc (Avro file) in it

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
it. Is there any possible/effiencient way to achieve this? Thanks, Aj On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > I will try that out. Thank you! > > On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com > <javascript:_e(%7B%7D,'cvml

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Deepak, Thanks for your response. If I am correct, you suggest reading all of those files into an rdd on the cluster using wholeTextFiles then apply compression codec on it, save the rdd to another Hadoop cluster? Thank you, Ajay On Tuesday, May 10, 2016, Deepak Sharma <deepa

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
I will try that out. Thank you! On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote: > Yes that's what I intended to say. > > Thanks > Deepak > On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com > <javascript:_e(%7B%7D,'cvml',

Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Everyone, we are planning to migrate the data between 2 clusters and I see distcp doesn't support data compression. Is there any efficient way to compress the data during the migration ? Can I implement any spark job to do this ? Thanks.

Re: Converting a string of format of 'dd/MM/yyyy' in Spark sql

2016-03-24 Thread Ajay Chander
Mich, Can you try the value for paymentdata to this format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see if it helps. On Thursday, March 24, 2016, Tamas Szuromi wrote: > Hi Mich, > > Take a look >

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ajay Chander
Hi Everyone, a quick question with in this context. What is the underneath persistent storage that you guys are using? With regards to this containerized environment? Thanks On Thursday, March 10, 2016, yanlin wang wrote: > How you guys make driver docker within container to be

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread Ajay Chander
Hi Ashok, Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot have that functionality. Let me know if it works. Thanks, Ajay On Friday, March 4, 2016, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi Ayan, > > Thanks for the response.

Spark UI documentaton needed

2016-02-22 Thread Ajay Gupta
Hi Sparklers, Can you guys give an elaborate documentation of Spark UI as there are many fields in it and we do not know much about it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-documentaton-needed-tp26300.html Sent from the Apache Spark

Spark Streaming : Limiting number of receivers per executor

2016-02-10 Thread ajay garg
Hi All, I am running 3 executors in my spark streaming application with 3 cores per executors. I have written my custom receiver for receiving network data. In my current configuration I am launching 3 receivers , one receiver per executor. In the run if 2 of my executor dies, I am left

Re: send transformed RDD to s3 from slaves

2015-11-14 Thread Ajay
Hi Walrus, Try caching the results just before calling the rdd.count. Regards, Ajay > On Nov 13, 2015, at 7:56 PM, Walrus theCat <walrusthe...@gmail.com> wrote: > > Hi, > > I have an RDD which crashes the driver when being collected. I want to send > the data on

Re: Spark_1.5.1_on_HortonWorks

2015-10-21 Thread Ajay Chander
that I can get it upgraded through Ambari UI ? If possible can anyone point me to a documentation online? Thank you. Regards, Ajay On Wednesday, October 21, 2015, Saisai Shao <sai.sai.s...@gmail.com> wrote: > Hi Frans, > > You could download Spark 1.5.1-hadoop 2.6 pre-built t

Spark_sql

2015-10-21 Thread Ajay Chander
Hi Everyone, I have a use case where I have to create a DataFrame inside the map() function. To create a DataFrame it need sqlContext or hiveContext. Now how do I pass the context to my map function ? And I am doing it in java. I tried creating a class "TestClass" which implements "Function

Spark_1.5.1_on_HortonWorks

2015-10-21 Thread Ajay Chander
toryServer.scala:231) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) I went to the lib folder and noticed that "spark-assembly-1.5.1-hadoop2.6.0.jar" is missing that class. I was able to get the spark history server started with 1.3.1 but not 1.5.1. Any inputs on this? Really appreciate your help. Thanks

Spark_1.5.1_on_HortonWorks

2015-10-20 Thread Ajay Chander
Hi Everyone, Any one has any idea if spark-1.5.1 is available as a service on HortonWorks ? I have spark-1.3.1 installed on the Cluster and it is a HortonWorks distribution. Now I want upgrade it to spark-1.5.1. Anyone here have any idea about it? Thank you in advance. Regards, Ajay

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ajay Chander
. Why don't you see hdfs logs and see what's happening when your application is talking to namenode? I suspect some networking issue or check if the datanodes are running fine. Thank you, Ajay On Saturday, October 3, 2015, Jacinto Arias <ja...@elrocin.es> wrote: > Yes printing t

Re: JavaRDD using Reflection

2015-09-14 Thread Ajay Singal
this helps! Ajay On Mon, Sep 14, 2015 at 1:21 PM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Hi Rachana > > I didn't get you r question fully but as the error says you can not > perform a rdd transformation or action inside another transformation. In > your examp

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
Hi David, Thanks for responding! My main intention was to submit spark Job/jar to yarn cluster from my eclipse with in the code. Is there any way that I could pass my yarn configuration somewhere in the code to submit the jar to the cluster? Thank you, Ajay On Sunday, August 30, 2015, David

submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
assumption is wrong or if I am missing anything here. I have attached the word count program that I was using. Any help is highly appreciated. Thank you, Ajay submit_spark_job Description: Binary data - To unsubscribe, e-mail: user

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
javascript:_e(%7B%7D,'cvml','wysakowicz.da...@gmail.com'); wrote: Hi Ajay, In short story: No, there is no easy way to do that. But if you'd like to play around this topic a good starting point would be this blog post from sequenceIQ: blog http://blog.sequenceiq.com/blog/2014/08/22/spark-submit

Re: Controlling number of executors on Mesos vs YARN

2015-08-13 Thread Ajay Singal
Hi Tim, An option like spark.mesos.executor.max to cap the number of executors per node/application would be very useful. However, having an option like spark.mesos.executor.num to specify desirable number of executors per node would provide even/much better control. Thanks, Ajay On Wed, Aug

Re: Controlling number of executors on Mesos vs YARN

2015-08-13 Thread Ajay Singal
specify desirable number of executors. If not available, Mesos (in a simple implementation) can provide/offer whatever is available. In a slightly complex implementation, we can build a simple protocol to negotiate. Regards, Ajay On Wed, Aug 12, 2015 at 5:51 PM, Tim Chen t...@mesosphere.io wrote

Re: How to increase parallelism of a Spark cluster?

2015-08-03 Thread Ajay Singal
, and if needed, I will open a JIRA item. I hope it helps. Regards, Ajay On Mon, Aug 3, 2015 at 1:16 PM, Sujit Pal sujitatgt...@gmail.com wrote: @Silvio: the mapPartitions instantiates a HttpSolrServer, then for each query string in the partition, sends the query to Solr using SolrJ

Re: Facing problem in Oracle VM Virtual Box

2015-07-24 Thread Ajay Singal
this helps. Ajay On Thu, Jul 23, 2015 at 6:40 AM, Chintan Bhatt chintanbhatt...@charusat.ac.in wrote: Hi. I'm facing following error while running .ova file containing Hortonworks with Spark in Oracle VM Virtual Box: Failed to open a session for the virtual machine *Hortonworks Sandbox

Re: ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries!

2015-07-24 Thread Ajay Singal
Hi Joji, To my knowledge, Spark does not offer any such function. I agree, defining a function to find an open (random) port would be a good option. However, in order to invoke the corresponding SparkUI one needs to know this port number. Thanks, Ajay On Fri, Jul 24, 2015 at 10:19 AM, Joji

Re: ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries!

2015-07-24 Thread Ajay Singal
... Thanks, Ajay On Fri, Jul 24, 2015 at 6:21 AM, Joji John jj...@ebates.com wrote: *HI,* *I am getting this error for some of spark applications. I have multiple spark applications running in parallel. Is there a limit in the number of spark applications that I can run in parallel

PySpark Nested Json Parsing

2015-07-20 Thread Ajay
. *Ajay Dubey*

Re: Hive query execution from Spark(through HiveContext) failing with Apache Sentry

2015-06-17 Thread Ajay
Hi there! It seems like you have Read/Execute access permission (and no update/insert/delete access). What operation are you performing? Ajay On Jun 17, 2015, at 5:24 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a hive query from Spark code using HiveContext object

Instantiating/starting Spark jobs programmatically

2015-04-20 Thread Ajay Singal
/tips/best-practices in this regard? Cheers! Ajay -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Instantiating-starting-Spark-jobs-programmatically-tp22577.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Some tasks are taking long time

2015-01-15 Thread Ajay Srivastava
Thanks RK. I can turn on speculative execution but I am trying to find out actual reason for delay as it happens on any node. Any idea about the stack trace in my previous mail. Regards,Ajay On Thursday, January 15, 2015 8:02 PM, RK prk...@yahoo.com.INVALID wrote: If you don't want

Re: Some tasks are taking long time

2015-01-15 Thread Ajay Srivastava
Thanks Nicos.GC does not contribute much to the execution time of the task. I will debug it further today. Regards,Ajay On Thursday, January 15, 2015 11:55 PM, Nicos n...@hotmail.com wrote: Ajay, Unless we are dealing with some synchronization/conditional variable bug in Spark, try

Some tasks are taking long time

2015-01-15 Thread Ajay Srivastava
)     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)     at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)     Any inputs/suggestions to improve job time will be appreciated. Regards,Ajay

Re: Creating RDD from only few columns of a Parquet file

2015-01-13 Thread Ajay Srivastava
Setting spark.sql.hive.convertMetastoreParquet to true has fixed this. Regards,Ajay On Tuesday, January 13, 2015 11:50 AM, Ajay Srivastava a_k_srivast...@yahoo.com.INVALID wrote: Hi,I am trying to read a parquet file using -val parquetFile = sqlContext.parquetFile(people.parquet

Creating RDD from only few columns of a Parquet file

2015-01-12 Thread Ajay Srivastava
is reading all the columns from disk in case of table1 when it needs only 3 columns. How should I make sure that it reads only 3 of 10 columns from disk ? Regards, Ajay

RDD for Storm Streaming in Spark

2014-12-23 Thread Ajay
Hi, Can we use Storm Streaming as RDD in Spark? Or any way to get Spark work with Storm? Thanks Ajay

Re: RDD for Storm Streaming in Spark

2014-12-23 Thread Ajay
Hi, The question is to do streaming in Spark with Storm (not using Spark Streaming). The idea is to use Spark as a in-memory computation engine and static data coming from Cassandra/Hbase and streaming data from Storm. Thanks Ajay On Tue, Dec 23, 2014 at 2:03 PM, Gerard Maas gerard.m

Re: RDD for Storm Streaming in Spark

2014-12-23 Thread Ajay
Right. I contacted the SummingBird users as well. It doesn't support Spark streaming currently. We are heading towards Storm as it is mostly widely used. Is Spark streaming production ready? Thanks Ajay On Tue, Dec 23, 2014 at 3:47 PM, Gerard Maas gerard.m...@gmail.com wrote: I'm not aware

Spark SQL Vs CQL performance on Cassandra

2014-12-11 Thread Ajay
) It takes around .6 second using Spark (either SELECT * FROM users WHERE name='Anna' or javaFunctions(sc).cassandraTable(test, people, mapRowTo(Person.class)).where(name=?, Anna); Please let me know if I am missing something in Spark configuration or Cassandra-Spark Driver. Thanks Ajay Garga

Clarifications on Spark

2014-12-04 Thread Ajay
Hadoop, HBase?. We may use Cassandra/MongoDb/CouchBase as well. 4) Is Spark supports RDBMS too?. We can have a single interface to pull out data from multiple data sources? 5) Any recommendations(not limited to usage of Spark) for our specific requirement described above. Thanks Ajay Note : I have

Re: Joined RDD

2014-11-13 Thread ajay garg
Yes that is my understanding of how it should work. But in my case when I call collect first time, it reads the data from files on the disk. Subsequent collect queries are not reading data files ( Verified from the logs.) On spark ui I see only shuffle read and no shuffle write. -- View this

Joined RDD

2014-11-12 Thread ajay garg
. Since no data is cached in spark how is action on C is served without reading data from disk. Thanks --Ajay -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joined-RDD-tp18820.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Spark summit 2014 videos ?

2014-07-10 Thread Ajay Srivastava
Hi, I did not find any videos on apache spark channel in youtube yet. Any idea when these will be made available ? Regards, Ajay

OFF_HEAP storage level

2014-07-04 Thread Ajay Srivastava
also explain the behavior of storage level - NONE ? Regards, Ajay

Map with filter on JavaRdd

2014-06-27 Thread ajay garg
Hi All, Is it possible to map and filter a javardd in a single operation? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Map with filter on JavaRdd

2014-06-27 Thread ajay garg
Thanks Mayur for clarification.. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401p8410.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Join : Giving incorrect result

2014-06-06 Thread Ajay Srivastava
Thanks Matei. We have tested the fix and it's working perfectly. Andrew, we set spark.shuffle.spill=false but the application goes out of memory. I think that is expected. Regards,Ajay On Friday, June 6, 2014 3:49 AM, Andrew Ash and...@andrewash.com wrote: Hi Ajay, Can you please try

Re: Join : Giving incorrect result

2014-06-05 Thread Ajay Srivastava
there will not be any mismatch of jars. On two workers, since executor memory gets doubled the code works fine. Regards, Ajay On Thursday, June 5, 2014 1:35 AM, Matei Zaharia matei.zaha...@gmail.com wrote: If this isn’t the problem, it would be great if you can post the code for the program. Matei

Join : Giving incorrect result

2014-06-04 Thread Ajay Srivastava
and looks correct. But when single worker is used with two or more than two cores, the result seems to be random. Every time, count of joined record is different. Does this sound like a defect or I need to take care of something while using join ? I am using spark-0.9.1. Regards Ajay