RE: How to specify default value for StructField?
Thanks Yan and Yong, Yes, from Spark, I can access ORC files loaded to Hive tables. Thanks. From: 颜发才(Yan Facai) [mailto:facai@gmail.com] Sent: Friday, February 17, 2017 6:59 PM To: Yong Zhang <java8...@hotmail.com> Cc: Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>; user@spark.apache.org Subject: Re: How to specify default value for StructField? I agree with Yong Zhang, perhaps spark sql with hive could solve the problem: http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote: If it works under hive, do you try just create the DF from Hive table directly in Spark? That should work, right? Yong From: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>> Sent: Wednesday, February 15, 2017 10:16 AM To: Yong Zhang; smartzjp; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: How to specify default value for StructField? Thanks Yong. I know about merging the schema option. Using Hive we can read AVRO files having different schemas. And also we can do the same in Spark also. Similarly we can read ORC files having different schemas in Hive. But, we can’t do the same in Spark using dataframe. How we can do it using dataframe? Thanks. From: Yong Zhang [mailto:java8...@hotmail.com<mailto:java8...@hotmail.com>] Sent: Tuesday, February 14, 2017 8:31 PM To: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>; smartzjp <zjp_j...@163.com<mailto:zjp_j...@163.com>>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: How to specify default value for StructField? You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong. You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [] Yong From: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>> Sent: Tuesday, February 14, 2017 10:37 AM To: smartzjp; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: How to specify default value for StructField? Thanks, it didn't work. Because, the folder has files from 2 different schemas. It fails with the following exception: org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1]; -Original Message- From: smartzjp [mailto:zjp_j...@163.com] Sent: Tuesday, February 14, 2017 10:32 AM To: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: How to specify default value for StructField? You can try the below code. val df = spark.read.format("orc").load("/user/hos/orc_files_test_together") df.select(“f1”,”f2”).show 在 2017/2/14 上午6:54,“vbegar”
Re: How to specify default value for StructField?
I agree with Yong Zhang, perhaps spark sql with hive could solve the problem: http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang <java8...@hotmail.com> wrote: > If it works under hive, do you try just create the DF from Hive table > directly in Spark? That should work, right? > > > Yong > > > -- > *From:* Begar, Veena <veena.be...@hpe.com> > *Sent:* Wednesday, February 15, 2017 10:16 AM > *To:* Yong Zhang; smartzjp; user@spark.apache.org > > *Subject:* RE: How to specify default value for StructField? > > > Thanks Yong. > > > > I know about merging the schema option. > > Using Hive we can read AVRO files having different schemas. And also we > can do the same in Spark also. > > Similarly we can read ORC files having different schemas in Hive. But, we > can’t do the same in Spark using dataframe. How we can do it using > dataframe? > > > > Thanks. > > *From:* Yong Zhang [mailto:java8...@hotmail.com] > *Sent:* Tuesday, February 14, 2017 8:31 PM > *To:* Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>; > user@spark.apache.org > *Subject:* Re: How to specify default value for StructField? > > > > You maybe are looking for something like "spark.sql.parquet.mergeSchema" > for ORC. Unfortunately, I don't think it is available, unless someone tells > me I am wrong. > > > You can create a JIRA to request this feature, but we all know that > Parquet is the first citizen format [image: ] > > > > Yong > > > ---------- > > *From:* Begar, Veena <veena.be...@hpe.com> > *Sent:* Tuesday, February 14, 2017 10:37 AM > *To:* smartzjp; user@spark.apache.org > *Subject:* RE: How to specify default value for StructField? > > > > Thanks, it didn't work. Because, the folder has files from 2 different > schemas. > It fails with the following exception: > org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input > columns: [f1]; > > > -Original Message- > From: smartzjp [mailto:zjp_j...@163.com <zjp_j...@163.com>] > Sent: Tuesday, February 14, 2017 10:32 AM > To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org > Subject: Re: How to specify default value for StructField? > > You can try the below code. > > val df = spark.read.format("orc").load("/user/hos/orc_files_test_ > together") > df.select(“f1”,”f2”).show > > > > > > 在 2017/2/14 > 上午6:54,“vbegar”
Re: How to specify default value for StructField?
If it works under hive, do you try just create the DF from Hive table directly in Spark? That should work, right? Yong From: Begar, Veena <veena.be...@hpe.com> Sent: Wednesday, February 15, 2017 10:16 AM To: Yong Zhang; smartzjp; user@spark.apache.org Subject: RE: How to specify default value for StructField? Thanks Yong. I know about merging the schema option. Using Hive we can read AVRO files having different schemas. And also we can do the same in Spark also. Similarly we can read ORC files having different schemas in Hive. But, we can’t do the same in Spark using dataframe. How we can do it using dataframe? Thanks. From: Yong Zhang [mailto:java8...@hotmail.com] Sent: Tuesday, February 14, 2017 8:31 PM To: Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>; user@spark.apache.org Subject: Re: How to specify default value for StructField? You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong. You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [??] Yong From: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>> Sent: Tuesday, February 14, 2017 10:37 AM To: smartzjp; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: How to specify default value for StructField? Thanks, it didn't work. Because, the folder has files from 2 different schemas. It fails with the following exception: org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1]; -Original Message- From: smartzjp [mailto:zjp_j...@163.com] Sent: Tuesday, February 14, 2017 10:32 AM To: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: How to specify default value for StructField? You can try the below code. val df = spark.read.format("orc").load("/user/hos/orc_files_test_together") df.select(“f1”,”f2”).show 在 2017/2/14 上午6:54,“vbegar”
RE: How to specify default value for StructField?
Thanks Yong. I know about merging the schema option. Using Hive we can read AVRO files having different schemas. And also we can do the same in Spark also. Similarly we can read ORC files having different schemas in Hive. But, we can’t do the same in Spark using dataframe. How we can do it using dataframe? Thanks. From: Yong Zhang [mailto:java8...@hotmail.com] Sent: Tuesday, February 14, 2017 8:31 PM To: Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>; user@spark.apache.org Subject: Re: How to specify default value for StructField? You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong. You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [] Yong From: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>> Sent: Tuesday, February 14, 2017 10:37 AM To: smartzjp; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: How to specify default value for StructField? Thanks, it didn't work. Because, the folder has files from 2 different schemas. It fails with the following exception: org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1]; -Original Message- From: smartzjp [mailto:zjp_j...@163.com] Sent: Tuesday, February 14, 2017 10:32 AM To: Begar, Veena <veena.be...@hpe.com<mailto:veena.be...@hpe.com>>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: How to specify default value for StructField? You can try the below code. val df = spark.read.format("orc").load("/user/hos/orc_files_test_together") df.select(“f1”,”f2”).show 在 2017/2/14 上午6:54,“vbegar”
Re: How to specify default value for StructField?
You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong. You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [] Yong From: Begar, Veena <veena.be...@hpe.com> Sent: Tuesday, February 14, 2017 10:37 AM To: smartzjp; user@spark.apache.org Subject: RE: How to specify default value for StructField? Thanks, it didn't work. Because, the folder has files from 2 different schemas. It fails with the following exception: org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1]; -Original Message- From: smartzjp [mailto:zjp_j...@163.com] Sent: Tuesday, February 14, 2017 10:32 AM To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org Subject: Re: How to specify default value for StructField? You can try the below code. val df = spark.read.format("orc").load("/user/hos/orc_files_test_together") df.select(“f1”,”f2”).show 在 2017/2/14 上午6:54,“vbegar”
RE: How to specify default value for StructField?
Thanks, it didn't work. Because, the folder has files from 2 different schemas. It fails with the following exception: org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1]; -Original Message- From: smartzjp [mailto:zjp_j...@163.com] Sent: Tuesday, February 14, 2017 10:32 AM To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org Subject: Re: How to specify default value for StructField? You can try the below code. val df = spark.read.format("orc").load("/user/hos/orc_files_test_together") df.select(“f1”,”f2”).show 在 2017/2/14 上午6:54,“vbegar”
Re: How to specify default value for StructField?
You can try the below code. val df = spark.read.format("orc").load("/user/hos/orc_files_test_together") df.select(“f1”,”f2”).show 在 2017/2/14