Re: How Fault Tolerance is achieved in Spark ??
Hi Nikhil, Fault tolerance is something which is not lost incase of failures. Fault tolerance achieved in different way in case of different cases. In case of HDFS fault tolerance is achieved by having the replication across different nodes. In case of spark fault tolerance is achieved by having DAG. Let me put in simple words You have created RDD1 by reading data from HDFS. Applied couple of transformations and created two new data frames RDD1-->RDD2--> RDD3. Let's assume now you have cached RDD3 and for after some time for some reason RDD3 cleared from cache from to provide space for new RDD4 created and cached. Now if you wanted to acccess RDD3 which is not available in cache. So now Spark will use the DAG to compute RDD3. So in this way Data in RDD3 always available. Hope this answer your question in straight way. Thank you, Naresh On Tue, Dec 12, 2017 at 12:51 AMwrote: > Hello Techie’s, > > > > How fault tolerance is achieved in Spark when data is read from HDFS and > is in form of RDD (Memory). > > > > Regards > > Nikhil > > > "*Confidentiality Warning*: This message and any attachments are intended > only for the use of the intended recipient(s), are confidential and may be > privileged. If you are not the intended recipient, you are hereby notified > that any review, re-transmission, conversion to hard copy, copying, > circulation or other use of this message and any attachments is strictly > prohibited. If you are not the intended recipient, please notify the sender > immediately by return email and delete this message and any attachments > from your system. > > *Virus Warning:* Although the company has taken reasonable precautions to > ensure no viruses are present in this email. The company cannot accept > responsibility for any loss or damage arising from the use of this email or > attachment." >
Access Array StructField inside StructType.
Hi All, How to iterate over the StructField inside *after*, StructType(StructField(*after*,StructType(*StructField(Alarmed,LongType,true), StructField(CallDollarLimit,StringType,true), StructField(CallRecordWav,StringType,true), StructField(CallTimeLimit,LongType,true), StructField(Signature,StringType,true*), true) Regards, Satyajit.
Re: How do I save the dataframe data as a pdf file?
No problem. Assuming you're data has been collected as "A = Array[Array[Double]]" something along the lines of "A.map(x => x.mkString(" & ")).mkString(" \n")" should do the trick. Another, somewhat more convoluted, option would be to write your data as a CSV or other delimited text file and then write a small Python/R wrapper which consumes those and writes tex tables. Anthony On Tue, Dec 12, 2017 at 11:38 AM, anna staxwrote: > Thanks Anthony for the response. > > Yes, the data in the dataframe represents a report and I want to create > pdf files. > I am using scala so hoping to find a easier solution in scala, if not I > will try out your suggestion . > > > On Tue, Dec 12, 2017 at 11:29 AM, Anthony Thomas > wrote: > >> Are you trying to produce a formatted table in a pdf file where the >> numbers in the table come from a dataframe? I.e. to present summary >> statistics or other aggregates? If so I would guess your best bet would be >> to collect the dataframe as a Pandas dataframe and use the to_latex method. >> You can then use a standard latex compiler to produce a pdf with a table >> containing that data. I don't know if there's any comparable built-in for >> Scala, but you could always collect the data as an array of arrays and >> write these to a tex file using standard IO. Maybe someone has an easier >> suggestion. >> >> On Tue, Dec 12, 2017 at 11:12 AM, shyla deshpande < >> deshpandesh...@gmail.com> wrote: >> >>> Hello all, >>> >>> Is there a way to write the dataframe data as a pdf file? >>> >>> Thanks >>> -Shyla >>> >> >> >
Re: How do I save the dataframe data as a pdf file?
Thanks Anthony for the response. Yes, the data in the dataframe represents a report and I want to create pdf files. I am using scala so hoping to find a easier solution in scala, if not I will try out your suggestion . On Tue, Dec 12, 2017 at 11:29 AM, Anthony Thomaswrote: > Are you trying to produce a formatted table in a pdf file where the > numbers in the table come from a dataframe? I.e. to present summary > statistics or other aggregates? If so I would guess your best bet would be > to collect the dataframe as a Pandas dataframe and use the to_latex method. > You can then use a standard latex compiler to produce a pdf with a table > containing that data. I don't know if there's any comparable built-in for > Scala, but you could always collect the data as an array of arrays and > write these to a tex file using standard IO. Maybe someone has an easier > suggestion. > > On Tue, Dec 12, 2017 at 11:12 AM, shyla deshpande < > deshpandesh...@gmail.com> wrote: > >> Hello all, >> >> Is there a way to write the dataframe data as a pdf file? >> >> Thanks >> -Shyla >> > >
Re: Json to csv
I was curious about this too, and found this. You may find it helpful: http://www.tegdesign.com/converting-a-nested-json-document-to-csv-using-scala-hadoop-and-apache-spark/ Thanks, Subhash Sent from my iPhone > On Dec 12, 2017, at 1:44 AM, Prabha Kwrote: > > Any help on converting json to csv or flattering the json file. Json file has > one struts and multiple arrays. > Thanks > Pk > > Sent from my iPhone > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >
Re: How do I save the dataframe data as a pdf file?
Are you trying to produce a formatted table in a pdf file where the numbers in the table come from a dataframe? I.e. to present summary statistics or other aggregates? If so I would guess your best bet would be to collect the dataframe as a Pandas dataframe and use the to_latex method. You can then use a standard latex compiler to produce a pdf with a table containing that data. I don't know if there's any comparable built-in for Scala, but you could always collect the data as an array of arrays and write these to a tex file using standard IO. Maybe someone has an easier suggestion. On Tue, Dec 12, 2017 at 11:12 AM, shyla deshpandewrote: > Hello all, > > Is there a way to write the dataframe data as a pdf file? > > Thanks > -Shyla >
How do I save the dataframe data as a pdf file?
Hello all, Is there a way to write the dataframe data as a pdf file? Thanks -Shyla
Unsubscribe
Unsubscribe
Re: unsubscribe
subsubscribe On Tue, Dec 12, 2017 at 5:16 PM, Divya Narayanwrote: >
unsubscribe
unsubscribe
unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: RDD[internalRow] -> DataSet
not possible, but you can add your own object in your project to the spark's package that would give you access to private methods package org.apache.spark.sql import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.execution.LogicalRDD import org.apache.spark.sql.types.StructType object DataFrameUtil { /** * Creates a DataFrame out of RDD[InternalRow] that you can get using `df.queryExection.toRdd` */ def createFromInternalRows(sparkSession: SparkSession, schema: StructType, rdd: RDD[InternalRow]): DataFrame = { val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession) Dataset.ofRows(sparkSession, logicalPlan) } }
unsubscribe
unsubscribe
Re: Union of RDDs Hung
Can you show us the code? On Tue, Dec 12, 2017 at 9:02 AM, Vikash Pareekwrote: > Hi All, > > I am unioning 2 rdds(each of them having 2 records) but this union it is > getting hang. > I found a solution to this that is caching both the rdds before performing > union but I could not figure out the root cause of hanging the job. > > Is somebody knows why this happens with union? > > Spark version I am using is 1.6.1 > > > Best Regards, > Vikash Pareek > > > > - > > __Vikash Pareek > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Union of RDDs Hung
Hi All, I am unioning 2 rdds(each of them having 2 records) but this union it is getting hang. I found a solution to this that is caching both the rdds before performing union but I could not figure out the root cause of hanging the job. Is somebody knows why this happens with union? Spark version I am using is 1.6.1 Best Regards, Vikash Pareek - __Vikash Pareek -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org