using UDF( defined in Java) in scala through scala

2015-09-29 Thread ogoh
Hello, I have a udf declared in Java but I'd like to call it from spark-shell which only supports Scala. Since I am new to Scala, I couldn't figure out how to call register the Java UDF using sqlContext.udf.register in scala. Below is how I tried. I appreciate any help. Thanks, = my UDF in

SparkSQL 1.4 can't accept registration of UDF?

2015-07-14 Thread ogoh
Hello, I am using SparkSQL along with ThriftServer so that we can access using Hive queries. With Spark 1.3.1, I can register UDF function. But, Spark 1.4.0 doesn't work for that. The jar of the udf is same. Below is logs: I appreciate any advice. == With Spark 1.4 Beeline version 1.4.0 by

Re: Error when connecting to Spark SQL via Hive JDBC driver

2015-06-18 Thread ogoh
hello, I am not sure what is wrong.. But, in my case, I followed the instruction from http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HiveJDBCDriver.html. It worked fine with SQuirreL SQL Client (http://squirrel-sql.sourceforge.net/), and SQL Workbench J

SparkSQL : using Hive UDF returning Map throws rror: scala.MatchError: interface java.util.Map (of class java.lang.Class) (state=,code=0)

2015-06-04 Thread ogoh
Hello, I tested some custom udf on SparkSql's ThriftServer Beeline (Spark 1.3.1). Some udfs work fine (access array parameter and returning int or string type). But my udf returning map type throws an error: Error: scala.MatchError: interface java.util.Map (of class java.lang.Class)

SparkSQL's performance gets degraded depending on number of partitions of Hive tables..is it normal?

2015-06-01 Thread ogoh
is 6 ms. 2015-05-25 16:37:44 DEBUG Client:693 - Connecting to /10.128.193.211:9000 2015-05-25 16:37:44 DEBUG Client:1007 - IPC Client (2100771791) connection to /10.128.193.211:9000 from ogoh sending #151 2015-05-25 16:37:44 DEBUG Client:944 - IPC Client (2100771791) connection

Re: Spark 1.3.0 - 1.3.1 produces java.lang.NoSuchFieldError: NO_FILTER

2015-05-30 Thread ogoh
I had the same issue on AWS EMR with Spark 1.3.1.e (AWS version) passed with '-h' parameter (it is bootstrap action parameter for spark). I don't see the problem with Spark 1.3.1.e not passing the parameter. I am not sure about your env. Thanks, -- View this message in context:

SparkSQL's performance : contacting namenode and datanode to uncessarily check all partitions for a query of specific partitions

2015-05-25 Thread ogoh
(2100771791) connection to /10.128.193.211:9000 from ogoh sending #151 2015-05-25 16:37:44 DEBUG Client:944 - IPC Client (2100771791) connection to /10.128.193.211:9000 from ogoh: starting, having connections 2 2015-05-25 16:37:44 DEBUG Client:1064 - IPC Client (2100771791) connection

SparkSQL can't read S3 path for hive external table

2015-05-23 Thread ogoh
Hello, I am using Spark1.3 in AWS. SparkSQL can't recognize Hive external table on S3. The following is the error message. I appreciate any help. Thanks, Okehee -- 15/05/24 01:02:18 ERROR thriftserver.SparkSQLDriver: Failed in [select count(*) from api_search where pdate='2015-05-08']

SparkSQL failing while writing into S3 for 'insert into table'

2015-05-22 Thread ogoh
Hello, I am using spark 1.3 Hive 0.13.1 in AWS. From Spark-SQL, when running Hive query to export Hive query result into AWS S3, it failed with the following message: == org.apache.hadoop.hive.ql.metadata.HiveException: checkPaths:

beeline that comes with spark 1.3.0 doesn't work with --hiveconf or ''--hivevar which substitutes variables at hive scripts.

2015-04-22 Thread ogoh
Hello, I am using Spark 1.3 for SparkSQL (hive) ThriftServer Beeline. The Beeline doesn't work with --hiveconf or ''--hivevar which substitutes variables at hive scripts. I found the following jiras saying that Hive 0.13 resolved that issue. I wonder if this is well-known issue?

Generating a schema in Spark 1.3 failed while using DataTypes.

2015-04-02 Thread ogoh
Hello, My ETL uses sparksql to generate parquet files which are served through Thriftserver using hive ql. It especially defines a schema programmatically since the schema can be only known at runtime. With spark 1.2.1, it worked fine (followed

SparkSQL supports hive insert overwrite directory?

2015-03-06 Thread ogoh
; TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME temptable TOK_INSERT TOK_DESTINATION TOK_DIR '/user/ogoh/table' TOK_SELECT TOK_SELEXPR TOK_ALLCOLREF scala.NotImplementedError: No parse rules for: TOK_DESTINATION TOK_DIR '/user/bob/table

Hive on Spark vs. SparkSQL using Hive ?

2015-01-28 Thread ogoh
://spark.apache.org/docs/latest/sql-programming-guide.html)? Also, is there any update about SparkSQL's next release (current one is still alpha)? Thanks, OGoh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hive-on-Spark-vs-SparkSQL-using-Hive-tp21412.html