from:"Նարեկ Գալստեան"

Re: partitioning json data in spark

2015-12-28 Thread Նարեկ Գալստեան

Well, I could try to do that,
but *partitionBy *method is anyway only supported for Parquet format even
in Spark 1.5.1

Narek

Narek Galstyan

Նարեկ Գալստյան

On 27 December 2015 at 21:50, Ted Yu  wrote:

> Is upgrading to 1.5.x a possibility for you ?
>
> Cheers
>
> On Sun, Dec 27, 2015 at 9:28 AM, Նարեկ Գալստեան 
> wrote:
>
>>
>> http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
>>  I did try but it all was in vain.
>> It is also explicitly written in api docs that it only supports Parquet.
>>
>> 
>>
>> Narek Galstyan
>>
>> Նարեկ Գալստյան
>>
>> On 27 December 2015 at 17:52, Igor Berman  wrote:
>>
>>> have you tried to specify format of your output, might be parquet is
>>> default format?
>>> df.write().format("json").mode(SaveMode.Overwrite).save("/tmp/path");
>>>
>>> On 27 December 2015 at 15:18, Նարեկ Գալստեան 
>>> wrote:
>>>
>>>> Hey all!
>>>> I am willing to partition *json *data by a column name and store the
>>>> result as a collection of json files to be loaded to another database.
>>>>
>>>> I could use spark's built in *partitonBy *function but it only outputs
>>>> in parquet format which is not desirable for me.
>>>>
>>>> Could you suggest me a way to deal with this problem?
>>>> Narek Galstyan
>>>>
>>>> Նարեկ Գալստյան
>>>>
>>>
>>>
>>
>

Re: partitioning json data in spark

2015-12-27 Thread Նարեկ Գալստեան

http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
 I did try but it all was in vain.
It is also explicitly written in api docs that it only supports Parquet.



Narek Galstyan

Նարեկ Գալստյան

On 27 December 2015 at 17:52, Igor Berman  wrote:

> have you tried to specify format of your output, might be parquet is
> default format?
> df.write().format("json").mode(SaveMode.Overwrite).save("/tmp/path");
>
> On 27 December 2015 at 15:18, Նարեկ Գալստեան  wrote:
>
>> Hey all!
>> I am willing to partition *json *data by a column name and store the
>> result as a collection of json files to be loaded to another database.
>>
>> I could use spark's built in *partitonBy *function but it only outputs
>> in parquet format which is not desirable for me.
>>
>> Could you suggest me a way to deal with this problem?
>> Narek Galstyan
>>
>> Նարեկ Գալստյան
>>
>
>

partitioning json data in spark

2015-12-27 Thread Նարեկ Գալստեան

Hey all!
I am willing to partition *json *data by a column name and store the result
as a collection of json files to be loaded to another database.

I could use spark's built in *partitonBy *function but it only outputs in
parquet format which is not desirable for me.

Could you suggest me a way to deal with this problem?
Narek Galstyan

Նարեկ Գալստյան

Re: Debug Spark

2015-11-29 Thread Նարեկ Գալստեան

A question regarding the topic,

I am using Intellij to write spark applications and then have to ship the
source code to my cluster on the cloud to compile and test

is there a way to automatise the process using Intellij?

Narek Galstyan

Նարեկ Գալստյան

On 29 November 2015 at 20:51, Ndjido Ardo BAR  wrote:

> Masf, the following link sets the basics to start debugging your spark
> apps in local mode:
>
>
> https://medium.com/large-scale-data-processing/how-to-kick-start-spark-development-on-intellij-idea-in-4-steps-c7c8f5c2fe63#.675s86940
>
> Ardo
>
> On Sun, Nov 29, 2015 at 5:34 PM, Masf  wrote:
>
>> Hi Ardo
>>
>>
>> Some tutorial to debug with Intellij?
>>
>> Thanks
>>
>> Regards.
>> Miguel.
>>
>>
>> On Sun, Nov 29, 2015 at 5:32 PM, Ndjido Ardo BAR 
>> wrote:
>>
>>> hi,
>>>
>>> IntelliJ is just great for that!
>>>
>>> cheers,
>>> Ardo.
>>>
>>> On Sun, Nov 29, 2015 at 5:18 PM, Masf  wrote:
>>>
 Hi

 Is it possible to debug spark locally with IntelliJ or another IDE?

 Thanks

 --
 Regards.
 Miguel Ángel

>>>
>>>
>>
>>
>> --
>>
>>
>> Saludos.
>> Miguel Ángel
>>
>
>

Re: get directory names that are affected by sc.textFile("path/to/dir///*.js")

2015-10-27 Thread Նարեկ Գալստեան

well I do not really need to do it while another job is editing them.
I just need to get the names of the folders when I read through
textFile("path/to/dir/*/*/*.js")

Using *native hadoop* libraries, can I do something like*
fs.copy("/my/path/*/*","new/path/")?*



Narek Galstyan

Նարեկ Գալստյան

On 27 October 2015 at 19:13, Deenar Toraskar 
wrote:

> This won't work as you can never guarantee which files were read by Spark
> if some other process is writing files to the same location. It would be
> far less work to move files matching your pattern to a staging location and
> then load them using sc.textFile. you should find hdfs file system calls
> that are equivalent to normal file system if command line tools like distcp
> or mv don't meet your needs.
> On 27 Oct 2015 1:49 p.m., "Նարեկ Գալստեան"  wrote:
>
>> Dear Spark users,
>>
>> I am reading a set of json files to compile them to Parquet data format.
>> I am willing to mark the folders in some way after having read their
>> contents so that I do not read it again(e.g. I can changed the name of the
>> folder).
>>
>> I use .textFile("path/to*/dir/*/*/*.js") *technique to* automatically
>> *detect
>> the files.
>> I cannot however, use the same notation* to rename them.*
>>
>> Could you suggest how I can *get the names of these folders* so that I can
>> rename them using native hadoop libraries.
>>
>> I am using Apache Spark 1.4.1
>>
>> I look forward to hearing suggestions!!
>>
>> yours,
>>
>> Narek
>>
>> Նարեկ Գալստյան
>>
>

get directory names that are affected by sc.textFile("path/to/dir///*.js")

2015-10-27 Thread Նարեկ Գալստեան

Dear Spark users,

I am reading a set of json files to compile them to Parquet data format.
I am willing to mark the folders in some way after having read their
contents so that I do not read it again(e.g. I can changed the name of the
folder).

I use .textFile("path/to*/dir/*/*/*.js") *technique to* automatically
*detect
the files.
I cannot however, use the same notation* to rename them.*

Could you suggest how I can *get the names of these folders* so that I can
rename them using native hadoop libraries.

I am using Apache Spark 1.4.1

I look forward to hearing suggestions!!

yours,

Narek

Նարեկ Գալստյան

Interactively search Parquet-stored data using Spark Streaming and DataFrames

2015-09-28 Thread Նարեկ Գալստեան

I have significant amount of data stored on my Hadoop HDFS as Parquet files
I am using Spark streaming to interactively receive queries from a web
server and transform the received queries into SQL to run on my data using
SparkSQL.

In this process I need to run several SQL queries and then return some
aggregate result by merging or subtracting the results of individual
queries.

Are there any ways I could optimize and increase the speed of the process
by, for example, running queries on already received dataframes rather than
the whole database?

Is there a better way to interactively query the Parquet stored data and
give results?

Thank you!



Narek Galstyan

Նարեկ Գալստյան

Re: partitioning json data in spark

Re: partitioning json data in spark

partitioning json data in spark

Re: Debug Spark

Re: get directory names that are affected by sc.textFile("path/to/dir///*.js")

get directory names that are affected by sc.textFile("path/to/dir///*.js")

Interactively search Parquet-stored data using Spark Streaming and DataFrames

7 matches

Site Navigation

Mail list logo

Footer information

Re: partitioning json data in spark

Re: partitioning json data in spark

partitioning json data in spark

Re: Debug Spark

Re: get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")

get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")

Interactively search Parquet-stored data using Spark Streaming and DataFrames

7 matches

Mail list logo

Re: get directory names that are affected by sc.textFile("path/to/dir///*.js")

get directory names that are affected by sc.textFile("path/to/dir///*.js")