date:20170612

Re: Parquet file generated by Spark, but not compatible read by Hive

2017-06-12 Thread ayan guha

Try setting following Param: conf.set("spark.sql.hive.convertMetastoreParquet","false") On Tue, Jun 13, 2017 at 3:34 PM, Angel Francisco Orta < angel.francisco.o...@gmail.com> wrote: > Hello, > > Do you use df.write or you make with hivecontext.sql(" insert into ...")? > > Angel. > > El 12 jun.

Re: Parquet file generated by Spark, but not compatible read by Hive

2017-06-12 Thread Angel Francisco Orta

Hello, Do you use df.write or you make with hivecontext.sql(" insert into ...")? Angel. El 12 jun. 2017 11:07 p. m., "Yong Zhang" escribió: > We are using Spark *1.6.2* as ETL to generate parquet file for one > dataset, and partitioned by "brand" (which is a string to

Re: Use SQL Script to Write Spark SQL Jobs

2017-06-12 Thread Benjamin Kim

Hi Bo, +1 for your project. I come from the world of data warehouses, ETL, and reporting analytics. There are many individuals who do not know or want to do any coding. They are content with ANSI SQL and stick to it. ETL workflows are also done without any coding using a drag-and-drop user

Re: Use SQL Script to Write Spark SQL Jobs

2017-06-12 Thread bo yang

Hi Aakash, Thanks for your willing to help :) It will be great if I could get more feedback on my project. For example, is there any other people feeling the need of using a script to write Spark job easily? Also, I would explore whether it is possible that the Spark project takes some work to

Re: Deciphering spark warning "Truncated the string representation of a plan since it was too large."

2017-06-12 Thread lucas.g...@gmail.com

AFAIK the process a spark program follows is: 1. A set of transformations are defined on a given input dataset. 2. At some point an action is called 1. In your case this is writing to your parquet file. 3. When that happens spark creates a logical plan and then a physical plan

Deciphering spark warning "Truncated the string representation of a plan since it was too large."

2017-06-12 Thread Henry M

I am trying to understand if I should be concerned about this warning: "WARN Utils:66 - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf" It occurs while writing a data frame to

broadcast() multiple times the same df. Is it cached ?

2017-06-12 Thread matd

Hi spark folks, In our application, we have to join a dataframe with several other df (not always the same joining column). This left-hand side df is not very large, so a broadcast hint may be beneficial. My questions : - if the same df get broadcast multiple times, will the transfer occur once

Parquet file generated by Spark, but not compatible read by Hive

2017-06-12 Thread Yong Zhang

We are using Spark 1.6.2 as ETL to generate parquet file for one dataset, and partitioned by "brand" (which is a string to represent brand in this dataset). After the partition files generated in HDFS like "brand=a" folder, we add the partitions in the Hive. The hive version is 1.2.1 (In

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

2017-06-12 Thread Thakrar, Jayesh

Could this be due to https://issues.apache.org/jira/browse/HIVE-6 ? From: Patrik Medvedev Date: Monday, June 12, 2017 at 2:31 AM To: Jörn Franke , vaquar khan Cc: Jean Georges Perrin , User

Re: Use SQL Script to Write Spark SQL Jobs

2017-06-12 Thread Aakash Basu

Hey, I work on Spark SQL and would pretty much be able to help you in this. Let me know your requirement. Thanks, Aakash. On 12-Jun-2017 11:00 AM, "bo yang" wrote: > Hi Guys, > > I am writing a small open source project > to use

RE: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-12 Thread Mohammed Guller

Regarding Spark scheduler – if you are referring to the ability to distribute workload and scale, Kafka Streaming also provides that capability. It is deceptively simple in that regard if you already have a Kafka cluster. You can launch multiple instances of your Kafka streaming application and

Re: [E] Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-12 Thread Rastogi, Pankaj

Please make sure that you have enough memory available on the driver node. If there is not enough free memory on the driver node, then your application won't start. Pankaj From: vaquar khan > Date: Saturday, June 10, 2017 at 5:02 PM To:

Re: [How-To] Custom file format as source

2017-06-12 Thread Vadim Semenov

It should be easy to start with a custom Hadoop InputFormat that reads the file and creates a `RDD[Row]`, since you know the records size, it should be pretty easy to make the InputFormat to produce splits, so then you could read the file in parallel. On Mon, Jun 12, 2017 at 6:01 AM, OBones

Re: SPARK environment settings issue when deploying a custom distribution

2017-06-12 Thread Chanh Le

Just add more information how I build the custom distribution. I clone spark repo then switch to branch 2.2 then make distribution that following. λ ~/workspace/big_data/spark/ branch-2.2* λ ~/workspace/big_data/spark/ ./dev/make-distribution.sh --name custom --tgz -Phadoop-2.7

SPARK environment settings issue when deploying a custom distribution

2017-06-12 Thread Chanh Le

Hi everyone, Recently I discovered an issue when processing csv of spark. So I decided to fix it following this https://issues.apache.org/jira/browse/SPARK-21024 I built a custom distribution for internal uses. I built it in my local machine then upload the distribution to server. server's

RE: [How-To] Custom file format as source

2017-06-12 Thread Mendelson, Assaf

Try https://mapr.com/blog/spark-data-source-api-extending-our-spark-sql-query-engine/ Thanks, Assaf. -Original Message- From: OBones [mailto:obo...@free.fr] Sent: Monday, June 12, 2017 1:01 PM To: user@spark.apache.org Subject: [How-To] Custom file format as source

[How-To] Custom file format as source

2017-06-12 Thread OBones

Hello, I have an application here that generates data files in a custom binary format that provides the following information: Column list, each column has a data type (64 bit integer, 32 bit string index, 64 bit IEEE float, 1 byte boolean) Catalogs that give modalities for some columns (ie,

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

2017-06-12 Thread Patrik Medvedev

Hello, All security checkings disabled, but i still don't have any info in result. вс, 11 июн. 2017 г. в 14:24, Jörn Franke : > Is sentry preventing the access? > > On 11. Jun 2017, at 01:55, vaquar khan wrote: > > Hi , > Pleaae check your firewall

Re: Parquet file generated by Spark, but not compatible read by Hive

Re: Parquet file generated by Spark, but not compatible read by Hive

Re: Use SQL Script to Write Spark SQL Jobs

Re: Use SQL Script to Write Spark SQL Jobs

Re: Deciphering spark warning "Truncated the string representation of a plan since it was too large."

Deciphering spark warning "Truncated the string representation of a plan since it was too large."

broadcast() multiple times the same df. Is it cached ?

Parquet file generated by Spark, but not compatible read by Hive

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

Re: Use SQL Script to Write Spark SQL Jobs

RE: What is the real difference between Kafka streaming and Spark Streaming?

Re: [E] Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

Re: [How-To] Custom file format as source

Re: SPARK environment settings issue when deploying a custom distribution

SPARK environment settings issue when deploying a custom distribution

RE: [How-To] Custom file format as source

[How-To] Custom file format as source

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

18 matches

Site Navigation

Mail list logo

Footer information