Re: [DISCUSS] Supporting hive on DataSourceV2

2020-03-23 Thread Ryan Blue
Hi Jacky, We’ve internally released support for Hive tables (and Spark FileFormat tables) using DataSourceV2 so that we can switch between catalogs; sounds like that’s what you are planning to build as well. It would be great to work with the broader community on a Hive connector. I will get a

Re: \r\n in csv output

2020-03-23 Thread Vipul Rajan
You can use newAPIHadoopFile import org.apache.hadoop.io.LongWritable import org.apache.hadoop.io.Text import org.apache.hadoop.conf.Configuration import org.apache.hadoop.mapreduce.lib.input.TextInputFormat val conf = new Configuration conf.set("textinputformat.record.delimiter", "\r\n") val

Re: \r\n in csv output

2020-03-23 Thread Steven Parkes
Hrm ... looks like we were setting this in the past although it looks like it was being ignored ... On Mon, Mar 23, 2020 at 12:53 PM Steven Parkes wrote: > SPARK-26108 / PR#23080 > added a require

\r\n in csv output

2020-03-23 Thread Steven Parkes
SPARK-26108 / PR#23080 added a require on CSVOptions#lineSeparator to be a single character. AFAICT, this keeps us from writing CSV files with \r\n line terminators. Wondering if this was intended or

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread Sean Owen
No, as I say, it seems to just generate a warning. OOPS can't be used with >= 32GB heap, so it just isn't. That's why I am asking what the problem is. Spark doesn't set this value as far as I can tell; maybe your env does. This is in any event not a Spark issue per se. On Mon, Mar 23, 2020 at

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread angers . zhu
If -Xmx is bigger then 32g, vm will not to use  UseCompressedOops as default, We can see a case, If we set spark.driver.memory is 64g, set -XX:+UseCompressedOops in spark.executor.extralJavaOptions, and set SPARK_DAEMON_MEMORY = 6g, Use current code , vm will got command like

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread Sean Owen
I'm still not sure if you are trying to enable it or disable it, and what the issue is? There is no logic in Spark that sets or disables this flag that I can see. On Mon, Mar 23, 2020 at 9:27 AM angers.zhu wrote: > Hi Sean, > > Yea, I set -XX:+UseCompressedOops in driver(you can see in

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread angers . zhu
Hi Sean, Yea,  I set  -XX:+UseCompressedOops in driver(you can see in command line) and these days, we have more user and I set spark.driver.memory to 64g, in Non-default VM flags it should be +XX:-UseCompressdOops , but it’s still +XX:-UseCompressdOops. I have find the

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread Sean Owen
I don't think Spark sets UseCompressedOops in any defaults; are you setting it? It can't be used with heaps >= 32GB. It doesn't seem to cause an error if you set it with large heaps, just a warning. What's the problem? On Mon, Mar 23, 2020 at 6:21 AM angers.zhu wrote: > Hi developers, > >

Partition by Custom Wrapping

2020-03-23 Thread nirmit jain
Hi Developer, Can someone help me to write coustom partitioning, where instead of writing data in the hierarchical format like this: Root- Data=A- Data=B- To something like this Data- A- ROOT- B- ROOT- If you could tell me what are the

Spark Thrift Server java vm problem need help

2020-03-23 Thread angers . zhu
Hi developers, These day I meet a strange problem and I can’t find whyWhen I start a spark thrift server with  spark.driver.memory 64g, then use jdk8/bin/jinfo pid to see vm flags got below information,In 64g vm, UseCompressedOops should be closed by default, why spark

[DISCUSS] Supporting hive on DataSourceV2

2020-03-23 Thread JackyLee
Hi devs, I’d like to start a discussion about Supporting Hive on DatasourceV2. We’re now working on a project using DataSourceV2 to provide multiple source support and it works with the data lake solution very well, yet it does not yet support HiveTable. There are 3 reasons why we need to support