Re: Spark parquet file read problem !

2017-07-30 Thread serkan taş
I checked and realised that the schema of the files different with some missing fields and some fields with same name but different type. How may i overcome the issue? Android için Outlook uygulamasini edinin From: pandees waran

how to get the key in Map with SQL

2017-07-30 Thread ??????????
Hi all, I have a table looks like: +-+-+ |A|B| |0|[Map(1->a),1]| |0|[Map(1->b),2]| I want to pickup the key and value in Map. My code looks like df.select($"B._2".alias("X"),$"B._1.key".alias("Y")).show The output is |X|Y| |1|null| |2|null| Would you like tell me how

Re: some Ideas on expressing Spark SQL using JSON

2017-07-30 Thread Gourav Sengupta
100% agreed with Sathish, In case I am not offending anyone, this kind of questions basically comes from individuals who are still in the mindset of JAVA way of solving problems which used to be around 10 years back. Therefore you will see a lot of user issues who are still used to writing around

Re: Logging in RDD mapToPair of Java Spark application

2017-07-30 Thread ayan guha
Not that I can think of. If you have Spark history Server running then it may be another place to look On Mon, Jul 31, 2017 at 9:48 AM, John Zeng wrote: > Hi, Ayan, > > > Thanks for the suggestion. I did that and got following weird message > even I enabled the log

SPARK Issue in Standalone cluster

2017-07-30 Thread Gourav Sengupta
Hi, I am working by creating a native SPARK standalone cluster ( https://spark.apache.org/docs/2.2.0/spark-standalone.html) Therefore I do not have a HDFS. EXERCISE: Its the most fundamental and simple exercise. Create a sample SPARK dataframe and then write it to a location and then read it

Re: Logging in RDD mapToPair of Java Spark application

2017-07-30 Thread John Zeng
Hi, Ayan, Thanks for the suggestion. I did that and got following weird message even I enabled the log aggregation: [root@john1 conf]# yarn logs -applicationId application_1501197841826_0013 17/07/30 16:45:06 INFO client.RMProxy: Connecting to ResourceManager at john1.dg/192.168.6.90:8032

Re: Logging in RDD mapToPair of Java Spark application

2017-07-30 Thread ayan guha
Hi As you are using yarn log aggregation, yarn moves all the logs to hdfs after the application completes. You can use following command to get the logs: yarn logs -applicationId On Mon, 31 Jul 2017 at 3:17 am, John Zeng wrote: > Thanks Riccardo for the valuable info.

Re: Spark parquet file read problem !

2017-07-30 Thread serkan taş
* for what ? Yehuda Finkelshtein > şunları yazdı (30 Tem 2017 20:45): Try to add "*" at the end of the folder and parquetFile = spark.read.parquet(“hdfs://xxx/20170719/*”) On Jul 30, 2017 19:13, "pandees waran"

Re: Spark parquet file read problem !

2017-07-30 Thread Yehuda Finkelshtein
Try to add "*" at the end of the folder and parquetFile = spark.read.parquet(“hdfs://xxx/20170719/*”) On Jul 30, 2017 19:13, "pandees waran" wrote: I have encountered the similar error when the schema / datatypes are conflicting in those 2 parquet files. Are you sure

Re: Logging in RDD mapToPair of Java Spark application

2017-07-30 Thread John Zeng
Thanks Riccardo for the valuable info. Following your guidance, I looked at the Spark UI and figured out the default logs location for executors is 'yarn/container-logs'. I ran my Spark app again and I can see a new folder was created for it: [root@john2 application_1501197841826_0013]# ls

Re: Spark parquet file read problem !

2017-07-30 Thread pandees waran
I have encountered the similar error when the schema / datatypes are conflicting in those 2 parquet files. Are you sure that the 2 individual files are in the same structure with similar datatypes. If not you have to fix this by enforcing the default values for the missing values to make the

Spark parquet file read problem !

2017-07-30 Thread serkan taş
Hi, I have a problem while reading parquet files located in hdfs. If i read the files individually nothing wrong and i can get the file content. parquetFile = spark.read.parquet(“hdfs://xxx/20170719/part-0-3a9c226f-4fef-44b8-996b-115a2408c746.snappy.parquet") and parquetFile =

OrderedDict to DF

2017-07-30 Thread ayan guha
Hi I have a orderedDict in python, and I would like to convert it to a DF, with columns in the same order. from collections import OrderedDict str = [OrderedDict([(u'MID', 15784879), (u'START_DATE', u'1983-06-16 00:00:00'), (u'END_DATE', u'1984-01-31 00:00:00'), (u'AUDIT_ID', u'16994174'),