Re: How Fault Tolerance is achieved in Spark ??

2017-12-12 Thread Naresh Dulam
Hi Nikhil,


Fault tolerance is something which is not lost incase of failures. Fault
tolerance achieved in different way in case of different cases.
In case of HDFS fault tolerance is achieved by having the replication
across different nodes.
In case of spark fault tolerance is achieved by having DAG.  Let me put in
simple words
 You have created RDD1 by reading data from HDFS. Applied couple of
transformations and created two new data frames

RDD1-->RDD2--> RDD3.

Let's assume now you have cached RDD3 and for after some time for some
reason RDD3 cleared from cache from to provide space for new RDD4 created
and cached.

Now if you wanted to acccess RDD3 which is not available in cache. So now
Spark will use the DAG to compute RDD3. So in this way Data in RDD3 always
available.


Hope this answer your question in straight way.

Thank you,
Naresh


On Tue, Dec 12, 2017 at 12:51 AM  wrote:

> Hello Techie’s,
>
>
>
> How fault tolerance is achieved in Spark when data is read from HDFS and
> is in form of RDD (Memory).
>
>
>
> Regards
>
> Nikhil
>
>
> "*Confidentiality Warning*: This message and any attachments are intended
> only for the use of the intended recipient(s), are confidential and may be
> privileged. If you are not the intended recipient, you are hereby notified
> that any review, re-transmission, conversion to hard copy, copying,
> circulation or other use of this message and any attachments is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately by return email and delete this message and any attachments
> from your system.
>
> *Virus Warning:* Although the company has taken reasonable precautions to
> ensure no viruses are present in this email. The company cannot accept
> responsibility for any loss or damage arising from the use of this email or
> attachment."
>


Access Array StructField inside StructType.

2017-12-12 Thread satyajit vegesna
Hi All,

How to iterate over the StructField inside *after*,

StructType(StructField(*after*,StructType(*StructField(Alarmed,LongType,true),
StructField(CallDollarLimit,StringType,true),
StructField(CallRecordWav,StringType,true),
StructField(CallTimeLimit,LongType,true),
StructField(Signature,StringType,true*), true)

Regards,
Satyajit.


Re: How do I save the dataframe data as a pdf file?

2017-12-12 Thread Anthony Thomas
No problem. Assuming you're data has been collected as "A =
Array[Array[Double]]" something along the lines of "A.map(x => x.mkString("
& ")).mkString("  \n")" should do the trick. Another, somewhat more
convoluted, option would be to write your data as a CSV or other delimited
text file and then write a small Python/R wrapper which consumes those and
writes tex tables.

Anthony

On Tue, Dec 12, 2017 at 11:38 AM, anna stax  wrote:

> Thanks Anthony for the response.
>
> Yes, the data in the dataframe represents a report and I want to create
> pdf files.
> I am using scala so hoping to find a easier solution in scala, if not I
> will try out your suggestion .
>
>
> On Tue, Dec 12, 2017 at 11:29 AM, Anthony Thomas 
> wrote:
>
>> Are you trying to produce a formatted table in a pdf file where the
>> numbers in the table come from a dataframe? I.e. to present summary
>> statistics or other aggregates? If so I would guess your best bet would be
>> to collect the dataframe as a Pandas dataframe and use the to_latex method.
>> You can then use a standard latex compiler to produce a pdf with a table
>> containing that data. I don't know if there's any comparable built-in for
>> Scala, but you could always collect the data as an array of arrays and
>> write these to a tex file using standard IO. Maybe someone has an easier
>> suggestion.
>>
>> On Tue, Dec 12, 2017 at 11:12 AM, shyla deshpande <
>> deshpandesh...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> Is there a way to write the dataframe data as a pdf file?
>>>
>>> Thanks
>>> -Shyla
>>>
>>
>>
>


Re: How do I save the dataframe data as a pdf file?

2017-12-12 Thread anna stax
Thanks Anthony for the response.

Yes, the data in the dataframe represents a report and I want to create pdf
files.
I am using scala so hoping to find a easier solution in scala, if not I
will try out your suggestion .


On Tue, Dec 12, 2017 at 11:29 AM, Anthony Thomas 
wrote:

> Are you trying to produce a formatted table in a pdf file where the
> numbers in the table come from a dataframe? I.e. to present summary
> statistics or other aggregates? If so I would guess your best bet would be
> to collect the dataframe as a Pandas dataframe and use the to_latex method.
> You can then use a standard latex compiler to produce a pdf with a table
> containing that data. I don't know if there's any comparable built-in for
> Scala, but you could always collect the data as an array of arrays and
> write these to a tex file using standard IO. Maybe someone has an easier
> suggestion.
>
> On Tue, Dec 12, 2017 at 11:12 AM, shyla deshpande <
> deshpandesh...@gmail.com> wrote:
>
>> Hello all,
>>
>> Is there a way to write the dataframe data as a pdf file?
>>
>> Thanks
>> -Shyla
>>
>
>


Re: Json to csv

2017-12-12 Thread Subhash Sriram
I was curious about this too, and found this. You may find it helpful:

http://www.tegdesign.com/converting-a-nested-json-document-to-csv-using-scala-hadoop-and-apache-spark/

Thanks,
Subhash 

Sent from my iPhone

> On Dec 12, 2017, at 1:44 AM, Prabha K  wrote:
> 
> Any help on converting json to csv or flattering the json file. Json file has 
> one struts and multiple arrays.
> Thanks 
> Pk 
> 
> Sent from my iPhone
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


Re: How do I save the dataframe data as a pdf file?

2017-12-12 Thread Anthony Thomas
Are you trying to produce a formatted table in a pdf file where the numbers
in the table come from a dataframe? I.e. to present summary statistics or
other aggregates? If so I would guess your best bet would be to collect the
dataframe as a Pandas dataframe and use the to_latex method. You can then
use a standard latex compiler to produce a pdf with a table containing that
data. I don't know if there's any comparable built-in for Scala, but you
could always collect the data as an array of arrays and write these to a
tex file using standard IO. Maybe someone has an easier suggestion.

On Tue, Dec 12, 2017 at 11:12 AM, shyla deshpande 
wrote:

> Hello all,
>
> Is there a way to write the dataframe data as a pdf file?
>
> Thanks
> -Shyla
>


How do I save the dataframe data as a pdf file?

2017-12-12 Thread shyla deshpande
Hello all,

Is there a way to write the dataframe data as a pdf file?

Thanks
-Shyla


Unsubscribe

2017-12-12 Thread Olivier MATRAT
Unsubscribe



Re: unsubscribe

2017-12-12 Thread Malcolm Croucher
subsubscribe




On Tue, Dec 12, 2017 at 5:16 PM, Divya Narayan 
wrote:

>


unsubscribe

2017-12-12 Thread Divya Narayan



unsubscribe

2017-12-12 Thread Anshuman Kumar
unsubscribe

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: RDD[internalRow] -> DataSet

2017-12-12 Thread Vadim Semenov
not possible, but you can add your own object in your project to the
spark's package that would give you access to private methods

package org.apache.spark.sql

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.StructType

object DataFrameUtil {
  /**
* Creates a DataFrame out of RDD[InternalRow] that you can get
using `df.queryExection.toRdd`
*/
  def createFromInternalRows(sparkSession: SparkSession, schema:
StructType, rdd: RDD[InternalRow]): DataFrame = {
val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession)
Dataset.ofRows(sparkSession, logicalPlan)
  }
}


unsubscribe

2017-12-12 Thread Felipe Gustavo
unsubscribe


Re: Union of RDDs Hung

2017-12-12 Thread Gerard Maas
Can you show us the code?

On Tue, Dec 12, 2017 at 9:02 AM, Vikash Pareek 
wrote:

> Hi All,
>
> I am unioning 2 rdds(each of them having 2 records) but this union it is
> getting hang.
> I found a solution to this that is caching both the rdds before performing
> union but I could not figure out the root cause of hanging the job.
>
> Is somebody knows why this happens with union?
>
> Spark version I am using is 1.6.1
>
>
> Best Regards,
> Vikash Pareek
>
>
>
> -
>
> __Vikash Pareek
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Union of RDDs Hung

2017-12-12 Thread Vikash Pareek
Hi All,

I am unioning 2 rdds(each of them having 2 records) but this union it is
getting hang.
I found a solution to this that is caching both the rdds before performing
union but I could not figure out the root cause of hanging the job.

Is somebody knows why this happens with union?

Spark version I am using is 1.6.1


Best Regards,
Vikash Pareek



-

__Vikash Pareek
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org