Re: Implementing TableProvider in Spark 3.0

2020-07-08 Thread Richard Xin
 Saw Sent from Yahoo Mail for iPhone On Wednesday, July 8, 2020, 9:26 PM, Sricheta Ruj wrote: Hello Spark Team   I am trying to use the DataSourceV2 API from Spark 3.0. I wanted to ask in case of write- how do I get the user specified schema?   This is what I am trying to

Re: tcps oracle connection from spark

2019-06-18 Thread Richard Xin
and btw, same connection string works fine when used in SQL Developer.  On Tuesday, June 18, 2019, 03:49:24 PM PDT, Richard Xin wrote: HI, I need help with tcps oracle connection from spark (version:  spark-2.4.0-bin-hadoop2.7) Properties prop = new Properties();prop.putAll(sparkOracle

tcps oracle connection from spark

2019-06-18 Thread Richard Xin
HI, I need help with tcps oracle connection from spark (version:  spark-2.4.0-bin-hadoop2.7) Properties prop = new Properties();prop.putAll(sparkOracle);  // username/password prop.put("javax.net.ssl.trustStore", "path to root.jks");prop.put("javax.net.ssl.trustStorePassword", "password_here");

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

2019-05-06 Thread Richard Xin
Sent from Yahoo Mail for iPhone On Monday, May 6, 2019, 18:34, Russell Spitzer wrote: Scala version mismatched Spark is shown at 2.12, the connector only has a 2.11 release  On Mon, May 6, 2019, 7:59 PM Richard Xin wrote: org.apache.spark spark-core_2.12 2.4.0 compile

spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

2019-05-06 Thread Richard Xin
org.apache.spark spark-core_2.12 2.4.0 compile org.apache.spark spark-sql_2.12 2.4.0 com.datastax.spark spark-cassandra-connector_2.11 2.4.1 I run spark-submit I got following exceptions on Spark 2.4.2, it works fine when running  spark-submit under 

how sequence of chained jars in spark.(driver/executor).extraClassPath matters

2017-09-13 Thread Richard Xin
so let's say I have chained path in spark.driver.extraClassPath/spark.executor.extraClassPath such as /path1/*:/path2/*, and I have different versions of the same jar under those 2 directories, how spark pick the version of jar to use, from /path1/*? Thanks.

can I do spark-submit --jars [s3://bucket/folder/jar_file]? or --jars

2017-07-28 Thread Richard Xin
Can we add extra library (jars on S3) to spark-submit? if yes, how? such as --jars, extraClassPath, extraLibPathThanks,Richard

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Richard Xin
I believe you could use JOLT (bazaarvoice/jolt) to flatten it to a json string and then to dataframe or dataset. | | | | | | | | | | | bazaarvoice/jolt jolt - JSON to JSON transformation library written in Java. | | | On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan

Re: apache-spark: Converting List of Rows into Dataset Java

2017-03-28 Thread Richard Xin
    JavaRDD jsonRDD =             new JavaSparkContext(sparkSession.sparkContext()).parallelize(results);                 Dataset peopleDF = sparkSession.createDataFrame(jsonRDD, Row.class); Richard Xin On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <ka...@datapine.com> wrote:

Re: Issue creating row with java.util.Map type

2017-01-27 Thread Richard Xin
try Row newRow = RowFactory.create(row.getString(0), row.getString(1), row.getMap(2)); On Friday, January 27, 2017 10:52 AM, Ankur Srivastava wrote: + DEV Mailing List On Thu, Jan 26, 2017 at 5:12 PM, Ankur Srivastava wrote:

is partitionBy of DataFrameWriter supported in 1.6.x?

2017-01-18 Thread Richard Xin
I found contradictions in document 1.6.0 and 2.1.x in http://spark.apache.org/docs/1.6.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriterit says: "This is only applicable for Parquet at the moment." in

Re: DataFrame to read json and include raw Json in DataFrame

2016-12-29 Thread Richard Xin
create a sparkSession and how to programmatically load data: Spark SQL and DataFrames - Spark 2.1.0 Documentation | | | Spark SQL and DataFrames - Spark 2.1.0 Documentation | | | On Thursday, December 29, 2016 5:16 PM, Richard Xin <richardxin...@yahoo.com.INVAL

DataFrame to read json and include raw Json in DataFrame

2016-12-29 Thread Richard Xin
Say I have following data in file:{"id":1234,"ln":"Doe","fn":"John","age":25} {"id":1235,"ln":"Doe","fn":"Jane","age":22} java code snippet:        final SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("json_test");         JavaSparkContext ctx = new

Re: access Broadcast Variables in Spark java

2016-12-20 Thread Richard Xin
try this:JavaRDD mapr = listrdd.map(x -> broadcastVar.value().get(x)); On Wednesday, December 21, 2016 2:25 PM, Sateesh Karuturi wrote: I need to process spark Broadcast variables using Java RDD API. This is my code what i have tried so far:This is only

Re: How to get recent value in spark dataframe

2016-12-18 Thread Richard Xin
I am not sure I understood your logic, but it seems to me that you could take a look of Hive's Lead/Lag functions. On Monday, December 19, 2016 1:41 AM, Milin korath wrote: thanks, I tried with left outer join. My dataset having around 400M records and lot

Re: Java to show struct field from a Dataframe

2016-12-17 Thread Richard Xin
rn row;             } From: Richard Xin <richardxin...@yahoo.com> Sent: Saturday, December 17, 2016 8:53 PM To: Yong Zhang; zjp_j...@163.com; user Subject: Re: Java to show struct field from a Dataframe I tried to transform root  |-- latitude: double (nullable = false)  |-- longitude: double (null

Re: Java to show struct field from a Dataframe

2016-12-17 Thread Richard Xin
le, as your schema defined. The error message clearly indicates the data doesn't match with  the type specified in the schema. I wonder how you are so sure about your data? Do you check it under other tool? Yong From: Richard Xin <richardxin...@yahoo.com.INVALID> Sent: Saturday, Dece

Re: Java to show struct field from a Dataframe

2016-12-17 Thread Richard Xin
data is good On Saturday, December 17, 2016 11:50 PM, "zjp_j...@163.com" wrote: #yiv7434848277 body {line-height:1.5;}#yiv7434848277 blockquote {margin-top:0px;margin-bottom:0px;margin-left:0.5em;}#yiv7434848277 div.yiv7434848277foxdiv20161217234614718397

Java to show struct field from a Dataframe

2016-12-17 Thread Richard Xin
let's say I have a DataFrame with schema of followings:root  |-- name: string (nullable = true)  |-- location: struct (nullable = true)  |    |-- longitude: double (nullable = true)  |    |-- latitude: double (nullable = true) df.show(); throws following exception: java.lang.ClassCastException:

Re: need help to have a Java version of this scala script

2016-12-17 Thread Richard Xin
iate functionimport static org.apache.spark.sql.functions.callUDF;import static org.apache.spark.sql.functions.col; udf should be callUDF e.g.ds.withColumn("localMonth", callUDF("toLocalMonth", col("unixTs"), col("tz"))) On 17 December 2016 at 09:54, Richa

need help to have a Java version of this scala script

2016-12-16 Thread Richard Xin
what I am trying to do:I need to add column (could be complicated transformation based on value of a column) to a give dataframe. scala script:val hContext = new HiveContext(sc) import hContext.implicits._ val df = hContext.sql("select x,y,cluster_no from test.dc") val len = udf((str: String) =>