Re: How to print DataFrame.show(100) to text file at HDFS

2019-04-13 Thread Nuthan Reddy
Hi Chetan, You can use spark-submit showDF.py | hadoop fs -put - showDF.txt showDF.py: from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Write stdout").getOrCreate() spark.sparkContext.setLogLevel("OFF") spark.table("").show(100,truncate=false) But is there any

Re: writing into oracle database is very slow

2019-04-13 Thread Yeikel
Are you sure you only need 10 partitions? Do you get the same performance writing to HDFS with 10 partitions? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Best Practice for Writing data into a Hive table

2019-04-13 Thread Yeikel
Writing to CSV is very slow. >From what I've seen this is the preferred way to write to hive ; myDf.createOrReplaceTempView("mytempTable") sqlContext.sql("create table mytable as select * from mytempTable"); Source :

RE: Question about relationship between number of files and initial tasks(partitions)

2019-04-13 Thread email
Before we can confirm that the issue is skewed data, let’s confirm it : import org.apache.spark.sql.functions.spark_partition_id df.groupBy(spark_partition_id).count This should give the number of records you have in each partition. From: Sagar Grover Sent: Thursday, April

Best Practice for Writing data into a Hive table

2019-04-13 Thread Debabrata Ghosh
Hi, Please can you let me know which of the following options would be a best practice for writing data into a Hive table : Option 1: outputDataFrame.write .mode(SaveMode.Overwrite) .format("csv") .save("hdfs_path") Option 2: Get the data from a dataframe and

ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-13 Thread Felix Cheung
Hi Spark community! As you know ApacheCon NA 2019 is coming this Sept and it’s CFP is now open! This is an important milestone as we celebrate 20 years of ASF. We have tracks like Big Data and Machine Learning among many others. Please submit your talks/thoughts/challenges/learnings here:

Offline state manipulation tool for structured streaming query

2019-04-13 Thread Jungtaek Lim
Hi Spark users, especially Structured Streaming users who are dealing with stateful queries, I'm pleased to introduce Spark State Tools, which enables offline state manipulations for structured streaming query. Basically the tool provides state as batch source and output so that you can read

How to print DataFrame.show(100) to text file at HDFS

2019-04-13 Thread Chetan Khatri
Hello Users, In spark when I have a DataFrame and do .show(100) the output which gets printed, I wants to save as it is content to txt file in HDFS. How can I do this? Thanks