from:"saatvikshah1994"

How does Spark handle timestamps during Pandas dataframe conversion

2017-07-27 Thread saatvikshah1994

I've summarized this question in detail in this StackOverflow question with code snippets and logs: https://stackoverflow.com/questions/45308406/how-does-spark-handle-timestamp-types-during-pandas-dataframe-conversion/. Looking for efficient solutions to this? -- View this message in context:

Informing Spark about specific Partitioning scheme to avoid shuffles

2017-07-22 Thread saatvikshah1994

Hi everyone, My environment is PySpark with Spark 2.0.0. I'm using spark to load data from a large number of files into a Spark dataframe with fields say field1 to field10. While loading my data I have ensured that records are partitioned by field1 and field2(without using partitionBy). This

Spark UI crashes on Large Workloads

2017-07-17 Thread saatvikshah1994

Hi, I have a pyspark App which when provided a huge amount of data as input throws the error explained here sometimes: https://stackoverflow.com/questions/32340639/unable-to-understand-error-sparklistenerbus-has-already-stopped-dropping-event. All my code is running inside the main function, and

PySpark working with Generators

2017-06-29 Thread saatvikshah1994

Hi, I have this file reading function is called /foo/ which reads contents into a list of lists or into a generator of list of lists representing the same file. When reading as a complete chunk(1 record array) I do something like: rdd = file_paths_rdd.map(lambda x:

Using Spark with Local File System/NFS

2017-06-22 Thread saatvikshah1994

Hi, I've downloaded and kept the same set of data files on all my cluster nodes, in the same absolute path - say /home/xyzuser/data/*. I am now trying to perform an operation(say open(filename).read()) on all these files in spark, but by passing local file paths. I was under the assumption that

Merging multiple Pandas dataframes

2017-06-19 Thread saatvikshah1994

Hi, I am iteratively receiving a file which can only be opened as a Pandas dataframe. For the first such file I receive, I am converting this to a Spark dataframe using the 'createDataframe' utility function. The next file onward, I am converting it and union'ing it into the first Spark

Best alternative for Category Type in Spark Dataframe

2017-06-15 Thread saatvikshah1994

Hi, I'm trying to convert a Pandas -> Spark dataframe. One of the columns I have is of the Category type in Pandas. But there does not seem to be support for this same type in Spark. What is the best alternative? -- View this message in context:

How does Spark handle timestamps during Pandas dataframe conversion

Informing Spark about specific Partitioning scheme to avoid shuffles

Spark UI crashes on Large Workloads

PySpark working with Generators

Using Spark with Local File System/NFS

Merging multiple Pandas dataframes

Best alternative for Category Type in Spark Dataframe

7 matches

Site Navigation

Mail list logo

Footer information