I am trying to run Google Dataflow code on Spark. It works fine as google
dataflow on google cloud platform. But while running on Spark I am getting
following error
16/11/02 11:14:32 INFO com.cloudera.dataflow.spark.SparkPipelineRunner:
Evaluating ParDo(GroupByKeyHashAndSortByKeyAndWindow)
http://stackoverflow.com/questions/36382052/converting-list-to-column-in-spark
On Fri, Jul 22, 2016 at 5:15 PM, Divya Gehlot
wrote:
> Hi,
> Can somebody help me by creating the dataframe column from the scala list .
> Would really appreciate the help .
>
> Thanks ,
>
rtitioning and indexing in ORC its blazing
> fast (query 64 million rows x 570 columns in 19 seconds). There is perhaps
> a reason why SPARK makes things slow while using ORC :)
>
>
> Regards,
> Gourav
>
> On Thu, Jul 21, 2016 at 12:40 PM, Ashutosh Kumar <kmr.ashutos...@gmai
our/folder/*.json"
> All files will be loaded into a dataframe and schema will be the union of
> all the different schemas of your json files (only if you have different
> schemas)
> It should work - let me know
>
> Simone Miraglia
> ------
>
-programming-guide.html#json-datasets
>
> Hope it helps
>
> Simone Miraglia
> ------
> Da: Ashutosh Kumar <kmr.ashutos...@gmail.com>
> Inviato: 21/07/2016 08:19
> A: user @spark <user@spark.apache.org>
> Oggetto: Reading multiple json
There is no database . I read files from google cloud storage /S3/hdfs.
Thanks
Ashutosh
On Thu, Jul 21, 2016 at 11:50 AM, Sree Eedupuganti wrote:
> Database you are using ?
>
I need to read bunch of json files kept in date wise folders and perform
sql queries on them using data frame. Is it possible to do so? Please
provide some pointers .
Thanks
Ashutosh
model.setRandomCenters takes two arguments ,
where as java method needs 3 ?
Any clues ?
Thanks
Ashutosh
On Wed, Apr 27, 2016 at 9:59 PM, Ashutosh Kumar <kmr.ashutos...@gmail.com>
wrote:
> The problem seems to be streamconxt.textFileStream(path) is not reading
> the file at all. It does n
.
Thanks
Ashutosh
On Wed, Apr 27, 2016 at 2:43 PM, Niki Pavlopoulou <n...@exonar.com> wrote:
> One of the reasons that happened to me (assuming everything is ok on your
> streaming process), is if you run it on local mode instead of local[*] use
> local[4].
>
> On 26 April 20
I see there is a library spark-csv which can be used for removing header
and processing of csv files. But it seems it works with sqlcontext only. Is
there a way to remove header from csv files without sqlcontext ?
Thanks
Ashutosh
I created a Streaming k means based on scala example. It keeps running
without any error but never prints predictions
Here is Log
19:15:05,050 INFO
org.apache.spark.streaming.scheduler.InputInfoTracker - remove old
batch metadata: 146167824 ms
19:15:10,001 INFO
Just out of curiosity I will like to know why a streaming program should
shutdown when no new data is arriving? I think it should keep waiting for
arrival of new records.
Thanks
Ashutosh
On Tue, Feb 23, 2016 at 9:17 PM, Hemant Bhanawat
wrote:
> A guess - parseRecord is
b 16, 2016 at 4:19 PM, Ashutosh Kumar <kmr.ashutos...@gmail.com>
> wrote:
>
>> Hi Chandeep,
>> Thanks for response. Issue is the new line feed is lost. All records
>> appear in one line only.
>>
>> Thanks
>> Ashutosh
>>
>> On Tue, Feb 16,
On Feb 16, 2016, at 9:33 AM, Ashutosh Kumar <kmr.ashutos...@gmail.com>
> wrote:
>
> I am getting multiple empty files for streaming output for each interval.
> To Avoid this I tried
>
> kStream.foreachRDD(new VoidFunction2<JavaRDD,Time>(){
>
>
>
>
>
I am getting multiple empty files for streaming output for each interval.
To Avoid this I tried
kStream.foreachRDD(new VoidFunction2(){
*public void call(JavaRDD rdd,Time time) throws Exception {
if(!rdd.isEmpty()){
Request to provide some pointer on this.
Thanks
On Mon, Feb 15, 2016 at 3:39 PM, Ashutosh Kumar <kmr.ashutos...@gmail.com>
wrote:
> I am getting multiple empty files for streaming output for each interval.
> To Avoid this I tried
>
> kStream.foreachRDD(new VoidFunc
I am getting multiple empty files for streaming output for each interval.
To Avoid this I tried
kStream.foreachRDD(new VoidFunction2(){
*public void call(JavaRDD rdd,Time time) throws Exception {
if(!rdd.isEmpty()){
I am looking for any easy to use visualization tool for KMeansModel
produced as a result of clustering .
Thanks
Ashutosh
18 matches
Mail list logo