RE: How to specify default value for StructField?

2017-02-21 Thread Begar, Veena
Thanks Yan and Yong,

Yes, from Spark, I can access ORC files loaded to Hive tables.

Thanks.

From: 颜发才(Yan Facai) [mailto:facai@gmail.com]
Sent: Friday, February 17, 2017 6:59 PM
To: Yong Zhang <java8...@hotmail.com>
Cc: Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>; 
user@spark.apache.org
Subject: Re: How to specify default value for StructField?

I agree with Yong Zhang,
perhaps spark sql with hive could solve the problem:

http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables



On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang 
<java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote:

If it works under hive, do you try just create the DF from Hive table directly 
in Spark? That should work, right?



Yong


From: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>
Sent: Wednesday, February 15, 2017 10:16 AM
To: Yong Zhang; smartzjp; user@spark.apache.org<mailto:user@spark.apache.org>

Subject: RE: How to specify default value for StructField?


Thanks Yong.



I know about merging the schema option.

Using Hive we can read AVRO files having different schemas. And also we can do 
the same in Spark also.

Similarly we can read ORC files having different schemas in Hive. But, we can’t 
do the same in Spark using dataframe. How we can do it using dataframe?



Thanks.

From: Yong Zhang [mailto:java8...@hotmail.com<mailto:java8...@hotmail.com>]
Sent: Tuesday, February 14, 2017 8:31 PM
To: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>; smartzjp 
<zjp_j...@163.com<mailto:zjp_j...@163.com>>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: How to specify default value for StructField?



You maybe are looking for something like "spark.sql.parquet.mergeSchema" for 
ORC. Unfortunately, I don't think it is available, unless someone tells me I am 
wrong.

You can create a JIRA to request this feature, but we all know that Parquet is 
the first citizen format []



Yong





From: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: How to specify default value for StructField?



Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input 
columns: [f1];


-Original Message-
From: smartzjp [mailto:zjp_j...@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54,“vbegar”

Re: How to specify default value for StructField?

2017-02-17 Thread Yan Facai
I agree with Yong Zhang,
perhaps spark sql with hive could solve the problem:

http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables




On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang <java8...@hotmail.com> wrote:

> If it works under hive, do you try just create the DF from Hive table
> directly in Spark? That should work, right?
>
>
> Yong
>
>
> --
> *From:* Begar, Veena <veena.be...@hpe.com>
> *Sent:* Wednesday, February 15, 2017 10:16 AM
> *To:* Yong Zhang; smartzjp; user@spark.apache.org
>
> *Subject:* RE: How to specify default value for StructField?
>
>
> Thanks Yong.
>
>
>
> I know about merging the schema option.
>
> Using Hive we can read AVRO files having different schemas. And also we
> can do the same in Spark also.
>
> Similarly we can read ORC files having different schemas in Hive. But, we
> can’t do the same in Spark using dataframe. How we can do it using
> dataframe?
>
>
>
> Thanks.
>
> *From:* Yong Zhang [mailto:java8...@hotmail.com]
> *Sent:* Tuesday, February 14, 2017 8:31 PM
> *To:* Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>;
> user@spark.apache.org
> *Subject:* Re: How to specify default value for StructField?
>
>
>
> You maybe are looking for something like "spark.sql.parquet.mergeSchema"
> for ORC. Unfortunately, I don't think it is available, unless someone tells
> me I am wrong.
>
>
> You can create a JIRA to request this feature, but we all know that
> Parquet is the first citizen format [image: ]
>
>
>
> Yong
>
>
> ----------
>
> *From:* Begar, Veena <veena.be...@hpe.com>
> *Sent:* Tuesday, February 14, 2017 10:37 AM
> *To:* smartzjp; user@spark.apache.org
> *Subject:* RE: How to specify default value for StructField?
>
>
>
> Thanks, it didn't work. Because, the folder has files from 2 different
> schemas.
> It fails with the following exception:
> org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input
> columns: [f1];
>
>
> -Original Message-
> From: smartzjp [mailto:zjp_j...@163.com <zjp_j...@163.com>]
> Sent: Tuesday, February 14, 2017 10:32 AM
> To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org
> Subject: Re: How to specify default value for StructField?
>
> You can try the below code.
>
> val df = spark.read.format("orc").load("/user/hos/orc_files_test_
> together")
> df.select(“f1”,”f2”).show
>
>
>
>
>
> 在 2017/2/14 
> 上午6:54,“vbegar”

Re: How to specify default value for StructField?

2017-02-15 Thread Yong Zhang
If it works under hive, do you try just create the DF from Hive table directly 
in Spark? That should work, right?


Yong



From: Begar, Veena <veena.be...@hpe.com>
Sent: Wednesday, February 15, 2017 10:16 AM
To: Yong Zhang; smartzjp; user@spark.apache.org
Subject: RE: How to specify default value for StructField?


Thanks Yong.



I know about merging the schema option.

Using Hive we can read AVRO files having different schemas. And also we can do 
the same in Spark also.

Similarly we can read ORC files having different schemas in Hive. But, we can’t 
do the same in Spark using dataframe. How we can do it using dataframe?



Thanks.

From: Yong Zhang [mailto:java8...@hotmail.com]
Sent: Tuesday, February 14, 2017 8:31 PM
To: Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>; 
user@spark.apache.org
Subject: Re: How to specify default value for StructField?



You maybe are looking for something like "spark.sql.parquet.mergeSchema" for 
ORC. Unfortunately, I don't think it is available, unless someone tells me I am 
wrong.

You can create a JIRA to request this feature, but we all know that Parquet is 
the first citizen format [??]



Yong





From: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: How to specify default value for StructField?



Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input 
columns: [f1];


-Original Message-
From: smartzjp [mailto:zjp_j...@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54,“vbegar”

RE: How to specify default value for StructField?

2017-02-15 Thread Begar, Veena
Thanks Yong.

I know about merging the schema option.
Using Hive we can read AVRO files having different schemas. And also we can do 
the same in Spark also.
Similarly we can read ORC files having different schemas in Hive. But, we can’t 
do the same in Spark using dataframe. How we can do it using dataframe?

Thanks.
From: Yong Zhang [mailto:java8...@hotmail.com]
Sent: Tuesday, February 14, 2017 8:31 PM
To: Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>; 
user@spark.apache.org
Subject: Re: How to specify default value for StructField?


You maybe are looking for something like "spark.sql.parquet.mergeSchema" for 
ORC. Unfortunately, I don't think it is available, unless someone tells me I am 
wrong.

You can create a JIRA to request this feature, but we all know that Parquet is 
the first citizen format []

Yong


From: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: How to specify default value for StructField?

Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input 
columns: [f1];


-Original Message-
From: smartzjp [mailto:zjp_j...@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54,“vbegar”

Re: How to specify default value for StructField?

2017-02-14 Thread Yong Zhang
You maybe are looking for something like "spark.sql.parquet.mergeSchema" for 
ORC. Unfortunately, I don't think it is available, unless someone tells me I am 
wrong.

You can create a JIRA to request this feature, but we all know that Parquet is 
the first citizen format []

Yong


From: Begar, Veena <veena.be...@hpe.com>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org
Subject: RE: How to specify default value for StructField?

Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input 
columns: [f1];


-Original Message-
From: smartzjp [mailto:zjp_j...@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54,“vbegar”

RE: How to specify default value for StructField?

2017-02-14 Thread Begar, Veena
Thanks, it didn't work. Because, the folder has files from 2 different schemas. 
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input 
columns: [f1];


-Original Message-
From: smartzjp [mailto:zjp_j...@163.com] 
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54,“vbegar”

Re: How to specify default value for StructField?

2017-02-14 Thread smartzjp
You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14