Re: Spark_1.5.1_on_HortonWorks

2015-10-21 Thread Ajay Chander
we share the same belief." >> >> >> On Wed, Oct 21, 2015 at 12:24 PM, Doug Balog <doug.sparku...@dugos.com >> <javascript:_e(%7B%7D,'cvml','doug.sparku...@dugos.com');>> wrote: >> > I have been running 1.5.1 with Hive in secure mode on HDP 2.2.4

Spark_sql

2015-10-21 Thread Ajay Chander
Hi Everyone, I have a use case where I have to create a DataFrame inside the map() function. To create a DataFrame it need sqlContext or hiveContext. Now how do I pass the context to my map function ? And I am doing it in java. I tried creating a class "TestClass" which implements "Function

Spark_1.5.1_on_HortonWorks

2015-10-21 Thread Ajay Chander
ostly the master node), Yarn will help to > distribute the Spark dependencies. The link I mentioned before is the one > you could follow, please read my previous mail. > > Thanks > Saisai > > > > On Thu, Oct 22, 2015 at 1:56 AM, Ajay Chander <itsche...@gmail.com>

Spark_1.5.1_on_HortonWorks

2015-10-20 Thread Ajay Chander
Hi Everyone, Any one has any idea if spark-1.5.1 is available as a service on HortonWorks ? I have spark-1.3.1 installed on the Cluster and it is a HortonWorks distribution. Now I want upgrade it to spark-1.5.1. Anyone here have any idea about it? Thank you in advance. Regards, Ajay

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
Mitchell jdavidmitch...@gmail.com wrote: Hi Ajay, Are you trying to save to your local file system or to HDFS? // This would save to HDFS under /user/hadoop/counter counter.saveAsTextFile(/user/hadoop/counter); David On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander itsche...@gmail.com

submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
Hi Everyone, Recently we have installed spark on yarn in hortonworks cluster. Now I am trying to run a wordcount program in my eclipse and I did setMaster(local) and I see the results that's as expected. Now I want to submit the same job to my yarn cluster from my eclipse. In storm basically I

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
Ajay Chander itsche...@gmail.com javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');: Hi David, Thanks for responding! My main intention was to submit spark Job/jar to yarn cluster from my eclipse with in the code. Is there any way that I could pass my yarn configuration somewhere in the code

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ajay Chander
Hi Jacin, If I was you, first thing that I would do is, write a sample java application to write data into hdfs and see if it's working fine. Meta data is being created in hdfs, that means, communication to namenode is working fine but not to datanodes since you don't see any data inside the

Spark_Usecase

2016-06-07 Thread Ajay Chander
Hi Spark users, Right now we are using spark for everything(loading the data from sqlserver, apply transformations, save it as permanent tables in hive) in our environment. Everything is being done in one spark application. The only thing we do before we launch our spark application through

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
t allows you to read >> data from a db directly so you dont need to go via spk streaming? >> >> >> hth >> >> >> >> >> >> >> >> >> >> >> On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <itsche...@gmail.com >>

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
t >>> into hdfs >>> >>> perhaps there is some sort of spark 'connectors' that allows you to read >>> data from a db directly so you dont need to go via spk streaming? >>> >>> >>> hth >>> >>> >>> >>> &

SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
Hi Spark Users, I hope everyone here are doing great. I am trying to read data from SAS through Spark SQL and write into HDFS. Initially, I started with pure java program please find the program and logs in the attached file sas_pure_java.txt . My program ran successfully and it returned the

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
Hi again, anyone in this group tried to access SAS dataset through Spark SQL ? Thank you Regards, Ajay On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi Spark Users, > > I hope everyone here are doing great. > > I am trying to read data from

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
mmy > SELECT > ID > , CLUSTERED > , SCATTERED > , RANDOMISED > , RANDOM_STRING > , SMALL_VC > , PADDING > FROM tmp > """ >HiveContext.sql(sqltext) > println ("\nFinished at"); sqlCo

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
_id, cust_id from sales limit 2; > 17 28017 > 18 10419 > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABU

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-11 Thread Ajay Chander
I tried implementing the same functionality through Scala as well. But no luck so far. Just wondering if anyone here tried using Spark SQL to read SAS dataset? Thank you Regards, Ajay On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Mich, I completely agree wi

SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-13 Thread Ajay Chander
try the same with another database? > As a workaround you can write the select statement yourself instead of just > providing the table name. > > On Jun 11, 2016, at 6:27 PM, Ajay Chander <itsche...@gmail.com> wrote: > > I tried implementing the same functionality through Scala

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
extracts > inherently. > But you can maintain a file e.g. extractRange.conf in hdfs , to read from > it the end range and update it with new end range from spark job before it > finishes with the new relevant ranges to be used next time. > > On Tue, Jun 7, 2016 at 8:49 PM, Ajay C

Re: Spark_API_Copy_From_Edgenode

2016-05-28 Thread Ajay Chander
Hi Everyone, Any insights on this thread? Thank you. On Friday, May 27, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi Everyone, > >I have some data located on the EdgeNode. Right > now, the process I follow to copy the data from Edgenode

Re: how to get file name of record being reading in spark

2016-05-31 Thread Ajay Chander
Hi Vikash, These are my thoughts, read the input directory using wholeTextFiles() which would give a paired RDD with key as file name and value as file content. Then you can apply a map function to read each line and append key to the content. Thank you, Aj On Tuesday, May 31, 2016, Vikash

Spark_API_Copy_From_Edgenode

2016-05-27 Thread Ajay Chander
Hi Everyone, I have some data located on the EdgeNode. Right now, the process I follow to copy the data from Edgenode to HDFS is through a shellscript which resides on Edgenode. In Oozie I am using a SSH action to execute the shell script on Edgenode which copies the

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Ajay Chander
ve. > > Cheers, > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Ajay Chander
Hi Mich, Right now I have a similar usecase where I have to delete some rows from a hive table. My hive table is of type ORC, Bucketed and included transactional property. I can delete from hive shell but not from my spark-shell or spark app. Were you able to find any work around? Thank you.

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ajay Chander
Hi Everyone, a quick question with in this context. What is the underneath persistent storage that you guys are using? With regards to this containerized environment? Thanks On Thursday, March 10, 2016, yanlin wang wrote: > How you guys make driver docker within container to be

Re: Converting a string of format of 'dd/MM/yyyy' in Spark sql

2016-03-24 Thread Ajay Chander
Mich, Can you try the value for paymentdata to this format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see if it helps. On Thursday, March 24, 2016, Tamas Szuromi wrote: > Hi Mich, > > Take a look >

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread Ajay Chander
Hi Ashok, Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot have that functionality. Let me know if it works. Thanks, Ajay On Friday, March 4, 2016, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi Ayan, > > Thanks for the response. I am using SQL

Hive_context

2016-05-23 Thread Ajay Chander
Hi Everyone, I am building a Java Spark application in eclipse IDE. From my application I want to use hiveContext to read tables from the remote Hive(Hadoop cluster). On my machine I have exported $HADOOP_CONF_DIR = {$HOME}/hadoop/conf/. This path has all the remote cluster conf details like

Re: Hive_context

2016-05-23 Thread Ajay Chander
gards, Aj On Monday, May 23, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi Everyone, > > I am building a Java Spark application in eclipse IDE. From my application > I want to use hiveContext to read tables from the remote Hive(Hadoop > cluster). On my machine I hav

Re: Hive_context

2016-05-24 Thread Ajay Chander
t; This way we can narrow down where the issue is ? > > > Sent from my iPhone > > On May 23, 2016, at 5:26 PM, Ajay Chander <itsche...@gmail.com > <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote: > > I downloaded the spark 1.5 untilities and exported

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
kmc...@gmail.com> wrote: > Hi Ajay > You can look at wholeTextFiles method of rdd[string,string] and then map > each of rdd to saveAsTextFile . > This will serve the purpose . > I don't think if anything default like distcp exists in spark > > Thanks > Deepak > On 10

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
it. Is there any possible/effiencient way to achieve this? Thanks, Aj On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > I will try that out. Thank you! > > On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com > <javascript:_e(%7B%7D,'cvml

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Never mind! I figured it out by saving it as hadoopfile and passing the codec to it. Thank you! On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi, I have a folder temp1 in hdfs which have multiple format files > test1.txt, test2.avsc (Avro file) in it

Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Everyone, we are planning to migrate the data between 2 clusters and I see distcp doesn't support data compression. Is there any efficient way to compress the data during the migration ? Can I implement any spark job to do this ? Thanks.

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
I will try that out. Thank you! On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote: > Yes that's what I intended to say. > > Thanks > Deepak > On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com > <javascript:_e(%7B%7D,'cvml',

Spark_JDBC_Partitions

2016-09-10 Thread Ajay Chander
Hello Everyone, My goal is to use Spark Sql to load huge amount of data from Oracle to HDFS. *Table in Oracle:* 1) no primary key. 2) Has 404 columns. 3) Has 200,800,000 rows. *Spark SQL:* In my Spark SQL I want to read the data into n number of partitions in parallel, for which I need to

Re: Spark_JDBC_Partitions

2016-09-19 Thread Ajay Chander
zadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Strange that Oracle table of 200Million plus rows has not been >>>>>>> partitioned. >>>>>>> >>>>>>> What matter

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
f you can create the dataframe in main, you can register it as a table > and run the queries in main method itself. You don't need to coalesce or > run the method within foreach. > > Regards > Sunita > > On Tuesday, October 25, 2016, Ajay Chander <itsche...@gmail.com> wrote:

HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Hi Everyone, I was thinking if I can use hiveContext inside foreach like below, object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf() val sc = new SparkContext(conf) val hiveContext = new HiveContext(sc) val dataElementsFile = args(0) val deDF =

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
> In your sample code, you can use hiveContext in the foreach as it is scala > List foreach operation which runs in driver side. But you cannot use > hiveContext in RDD.foreach > > > > Ajay Chander <itsche...@gmail.com>于2016年10月26日周三 上午11:28写道: > >> Hi Everyone,

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
The NPE you see is an unrelated cosmetic problem that was fixed in 2.0.1 > IIRC. > > On Wed, Oct 26, 2016 at 4:28 AM Ajay Chander <itsche...@gmail.com > <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote: > >> Hi Everyone, >> >> I was thinking if I

Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-07 Thread Ajay Chander
Hi Everyone, I am trying to develop a simple codebase on my machine to read data from secured Hadoop cluster. We have a development cluster which is secured through Kerberos and I want to run a Spark job from my IntelliJ to read some sample data from the cluster. Has anyone done this before ? Can

Re: Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-09 Thread Ajay Chander
tion: Can't get Master Kerberos principal for use as renewer sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println) //Getting this error: java.io.IOException: Can't get Master Kerberos principal for use as renewer } } On Mon, Nov 7, 2016

Re: Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-07 Thread Ajay Chander
Did anyone use https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala to interact with secured Hadoop from Spark ? Thanks, Ajay On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander <itsche...@gmail.com> wrote: > > Hi Everyo

Re: Code review / sqlContext Scope

2016-10-19 Thread Ajay Chander
quot;).mode("Append" ).insertInto("devl_df2_spf_batch.spf_supplier_trans_metric_detl_base_1") } } } This is my cluster( Spark 1.6.0 on Yarn, Cloudera 5.7.1) configuration, Memory -> 4.10 TB VCores -> 544 I am deploying the application in yarn client mode and the cluste