we share the same belief."
>>
>>
>> On Wed, Oct 21, 2015 at 12:24 PM, Doug Balog <doug.sparku...@dugos.com
>> <javascript:_e(%7B%7D,'cvml','doug.sparku...@dugos.com');>> wrote:
>> > I have been running 1.5.1 with Hive in secure mode on HDP 2.2.4
Hi Everyone,
I have a use case where I have to create a DataFrame inside the map()
function. To create a DataFrame it need sqlContext or hiveContext. Now how
do I pass the context to my map function ? And I am doing it in java. I
tried creating a class "TestClass" which implements "Function
ostly the master node), Yarn will help to
> distribute the Spark dependencies. The link I mentioned before is the one
> you could follow, please read my previous mail.
>
> Thanks
> Saisai
>
>
>
> On Thu, Oct 22, 2015 at 1:56 AM, Ajay Chander <itsche...@gmail.com>
Hi Everyone,
Any one has any idea if spark-1.5.1 is available as a service on
HortonWorks ? I have spark-1.3.1 installed on the Cluster and it is a
HortonWorks distribution. Now I want upgrade it to spark-1.5.1. Anyone here
have any idea about it? Thank you in advance.
Regards,
Ajay
Mitchell jdavidmitch...@gmail.com wrote:
Hi Ajay,
Are you trying to save to your local file system or to HDFS?
// This would save to HDFS under /user/hadoop/counter
counter.saveAsTextFile(/user/hadoop/counter);
David
On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander itsche...@gmail.com
Hi Everyone,
Recently we have installed spark on yarn in hortonworks cluster. Now I am
trying to run a wordcount program in my eclipse and I
did setMaster(local) and I see the results that's as expected. Now I want
to submit the same job to my yarn cluster from my eclipse. In storm
basically I
Ajay Chander itsche...@gmail.com
javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');:
Hi David,
Thanks for responding! My main intention was to submit spark Job/jar to
yarn cluster from my eclipse with in the code. Is there any way that I
could pass my yarn configuration somewhere in the code
Hi Jacin,
If I was you, first thing that I would do is, write a sample java
application to write data into hdfs and see if it's working fine. Meta data
is being created in hdfs, that means, communication to namenode is working
fine but not to datanodes since you don't see any data inside the
Hi Spark users,
Right now we are using spark for everything(loading the data from
sqlserver, apply transformations, save it as permanent tables in hive) in
our environment. Everything is being done in one spark application.
The only thing we do before we launch our spark application through
t allows you to read
>> data from a db directly so you dont need to go via spk streaming?
>>
>>
>> hth
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <itsche...@gmail.com
>>
t
>>> into hdfs
>>>
>>> perhaps there is some sort of spark 'connectors' that allows you to read
>>> data from a db directly so you dont need to go via spk streaming?
>>>
>>>
>>> hth
>>>
>>>
>>>
>>>
&
Hi Spark Users,
I hope everyone here are doing great.
I am trying to read data from SAS through Spark SQL and write into HDFS.
Initially, I started with pure java program please find the program and
logs in the attached file sas_pure_java.txt . My program ran successfully
and it returned the
Hi again, anyone in this group tried to access SAS dataset through Spark
SQL ? Thank you
Regards,
Ajay
On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Hi Spark Users,
>
> I hope everyone here are doing great.
>
> I am trying to read data from
mmy
> SELECT
> ID
> , CLUSTERED
> , SCATTERED
> , RANDOMISED
> , RANDOM_STRING
> , SMALL_VC
> , PADDING
> FROM tmp
> """
>HiveContext.sql(sqltext)
> println ("\nFinished at"); sqlCo
_id, cust_id from sales limit 2;
> 17 28017
> 18 10419
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABU
I tried implementing the same functionality through Scala as well. But no
luck so far. Just wondering if anyone here tried using Spark SQL to read
SAS dataset? Thank you
Regards,
Ajay
On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Mich, I completely agree wi
try the same with another database?
> As a workaround you can write the select statement yourself instead of just
> providing the table name.
>
> On Jun 11, 2016, at 6:27 PM, Ajay Chander <itsche...@gmail.com> wrote:
>
> I tried implementing the same functionality through Scala
extracts
> inherently.
> But you can maintain a file e.g. extractRange.conf in hdfs , to read from
> it the end range and update it with new end range from spark job before it
> finishes with the new relevant ranges to be used next time.
>
> On Tue, Jun 7, 2016 at 8:49 PM, Ajay C
Hi Everyone, Any insights on this thread? Thank you.
On Friday, May 27, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Hi Everyone,
>
>I have some data located on the EdgeNode. Right
> now, the process I follow to copy the data from Edgenode
Hi Vikash,
These are my thoughts, read the input directory using wholeTextFiles()
which would give a paired RDD with key as file name and value as file
content. Then you can apply a map function to read each line and append key
to the content.
Thank you,
Aj
On Tuesday, May 31, 2016, Vikash
Hi Everyone,
I have some data located on the EdgeNode. Right
now, the process I follow to copy the data from Edgenode to HDFS is through
a shellscript which resides on Edgenode. In Oozie I am using a SSH action
to execute the shell script on Edgenode which copies the
ve.
>
> Cheers,
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich
Hi Mich,
Right now I have a similar usecase where I have to delete some rows from a
hive table. My hive table is of type ORC, Bucketed and included
transactional property. I can delete from hive shell but not from my
spark-shell or spark app. Were you able to find any work around? Thank you.
Hi Everyone, a quick question with in this context. What is the underneath
persistent storage that you guys are using? With regards to this
containerized environment? Thanks
On Thursday, March 10, 2016, yanlin wang wrote:
> How you guys make driver docker within container to be
Mich,
Can you try the value for paymentdata to this
format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see if
it helps.
On Thursday, March 24, 2016, Tamas Szuromi
wrote:
> Hi Mich,
>
> Take a look
>
Hi Ashok,
Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot
have that functionality. Let me know if it works.
Thanks,
Ajay
On Friday, March 4, 2016, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:
> Hi Ayan,
>
> Thanks for the response. I am using SQL
Hi Everyone,
I am building a Java Spark application in eclipse IDE. From my application
I want to use hiveContext to read tables from the remote Hive(Hadoop
cluster). On my machine I have exported $HADOOP_CONF_DIR =
{$HOME}/hadoop/conf/. This path has all the remote cluster conf details
like
gards,
Aj
On Monday, May 23, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Hi Everyone,
>
> I am building a Java Spark application in eclipse IDE. From my application
> I want to use hiveContext to read tables from the remote Hive(Hadoop
> cluster). On my machine I hav
t; This way we can narrow down where the issue is ?
>
>
> Sent from my iPhone
>
> On May 23, 2016, at 5:26 PM, Ajay Chander <itsche...@gmail.com
> <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote:
>
> I downloaded the spark 1.5 untilities and exported
kmc...@gmail.com> wrote:
> Hi Ajay
> You can look at wholeTextFiles method of rdd[string,string] and then map
> each of rdd to saveAsTextFile .
> This will serve the purpose .
> I don't think if anything default like distcp exists in spark
>
> Thanks
> Deepak
> On 10
it. Is there any possible/effiencient way to achieve this?
Thanks,
Aj
On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> I will try that out. Thank you!
>
> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com
> <javascript:_e(%7B%7D,'cvml
Never mind! I figured it out by saving it as hadoopfile and passing the
codec to it. Thank you!
On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Hi, I have a folder temp1 in hdfs which have multiple format files
> test1.txt, test2.avsc (Avro file) in it
Hi Everyone,
we are planning to migrate the data between 2 clusters and I see distcp
doesn't support data compression. Is there any efficient way to compress
the data during the migration ? Can I implement any spark job to do this ?
Thanks.
I will try that out. Thank you!
On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote:
> Yes that's what I intended to say.
>
> Thanks
> Deepak
> On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com
> <javascript:_e(%7B%7D,'cvml',
Hello Everyone,
My goal is to use Spark Sql to load huge amount of data from Oracle to HDFS.
*Table in Oracle:*
1) no primary key.
2) Has 404 columns.
3) Has 200,800,000 rows.
*Spark SQL:*
In my Spark SQL I want to read the data into n number of partitions in
parallel, for which I need to
zadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Strange that Oracle table of 200Million plus rows has not been
>>>>>>> partitioned.
>>>>>>>
>>>>>>> What matter
f you can create the dataframe in main, you can register it as a table
> and run the queries in main method itself. You don't need to coalesce or
> run the method within foreach.
>
> Regards
> Sunita
>
> On Tuesday, October 25, 2016, Ajay Chander <itsche...@gmail.com> wrote:
Hi Everyone,
I was thinking if I can use hiveContext inside foreach like below,
object Test {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
val dataElementsFile = args(0)
val deDF =
> In your sample code, you can use hiveContext in the foreach as it is scala
> List foreach operation which runs in driver side. But you cannot use
> hiveContext in RDD.foreach
>
>
>
> Ajay Chander <itsche...@gmail.com>于2016年10月26日周三 上午11:28写道:
>
>> Hi Everyone,
The NPE you see is an unrelated cosmetic problem that was fixed in 2.0.1
> IIRC.
>
> On Wed, Oct 26, 2016 at 4:28 AM Ajay Chander <itsche...@gmail.com
> <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote:
>
>> Hi Everyone,
>>
>> I was thinking if I
Hi Everyone,
I am trying to develop a simple codebase on my machine to read data from
secured Hadoop cluster. We have a development cluster which is secured
through Kerberos and I want to run a Spark job from my IntelliJ to read
some sample data from the cluster. Has anyone done this before ? Can
tion: Can't get Master
Kerberos principal for use as renewer
sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println)
//Getting this error: java.io.IOException: Can't get Master
Kerberos principal for use as renewer
}
}
On Mon, Nov 7, 2016
Did anyone use
https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
to interact with secured Hadoop from Spark ?
Thanks,
Ajay
On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander <itsche...@gmail.com> wrote:
>
> Hi Everyo
quot;).mode("Append"
).insertInto("devl_df2_spf_batch.spf_supplier_trans_metric_detl_base_1")
}
}
}
This is my cluster( Spark 1.6.0 on Yarn, Cloudera 5.7.1) configuration,
Memory -> 4.10 TB
VCores -> 544
I am deploying the application in yarn client mode and the cluste
44 matches
Mail list logo