Unsubscribe
Unsubscribe Thanks and Regards,Malligarjunan S.
Re: Unsubscribe
Unsubscribe Thanks and Regards,Malligarjunan S. On Saturday, 3 December 2016, 20:42, Sivakumar Swrote: Unsubscribe
Re: Spark 1.1. doesn't work with hive context
It is my mistake, some how I have added the io.compression.codec property value as the above mentioned class. Resolved the problem now Thanks and Regards, Sankar S. On Wednesday, 27 August 2014, 1:23, S Malligarjunan smalligarju...@yahoo.com wrote: Hello all, I have just checked out branch-1.1 and executed below command ./bin/spark-shell --driver-memory 1G val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) hiveContext.hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) hiveContext.hql(LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src) // Queries are expressed in HiveQL hiveContext.hql(FROM src SELECT key, value).collect().foreach(println) I am getting the following exception Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) at org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:175) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 72 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found Thanks and Regards, Sankar S.
Re: SPARK Hive Context UDF Class Not Found Exception,
Hello Michel, I have executed git pull now, As per pom, version entry it is 1.1.0-SNAPSHOT. Thanks and Regards, Sankar S. On Tuesday, 26 August 2014, 1:00, Michael Armbrust mich...@databricks.com wrote: Which version of Spark SQL are you using? Several issues with custom hive UDFs have been fixed in 1.1. On Mon, Aug 25, 2014 at 9:57 AM, S Malligarjunan smalligarju...@yahoo.com.invalid wrote: Hello All, I have added a jar from S3 instance into classpath, i have tried following options 1. sc.addJar(s3n://mybucket/lib/myUDF.jar) 2. hiveContext.sparkContext.addJar(s3n://mybucket/lib/myUDF.jar) 3. ./bin/spark-shell --jars s3n://mybucket/lib/myUDF.jar I am getting ClassNotException when trying to create a temporary function. What would be the issue here? Thanks and Regards, Sankar S.
Spark 1.1. doesn't work with hive context
Hello all, I have just checked out branch-1.1 and executed below command ./bin/spark-shell --driver-memory 1G val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) hiveContext.hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) hiveContext.hql(LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src) // Queries are expressed in HiveQL hiveContext.hql(FROM src SELECT key, value).collect().foreach(println) I am getting the following exception Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) at org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:175) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 72 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found Thanks and Regards, Sankar S.
SPARK Hive Context UDF Class Not Found Exception,
Hello All, I have added a jar from S3 instance into classpath, i have tried following options 1. sc.addJar(s3n://mybucket/lib/myUDF.jar) 2. hiveContext.sparkContext.addJar(s3n://mybucket/lib/myUDF.jar) 3. ./bin/spark-shell --jars s3n://mybucket/lib/myUDF.jar I am getting ClassNotException when trying to create a temporary function. What would be the issue here? Thanks and Regards, Sankar S.
Re: Spark SQL Parser error
Hello Yin, Additional note: In ./bin/spark-shell --jars s3n:/mybucket/myudf.jar I got the following message in console. Waring skipped external jar.. Thanks and Regards, Sankar S. On , S Malligarjunan smalligarju...@yahoo.com wrote: Hello Yin, I have tried use sc.addJar and hiveContext.sparkContext.addJar and ./bin/spark-shell --jars option, In all three option when I try to create temporary funtion i get the classNotFoundException. What would be the issue here? Thanks and Regards, Sankar S. On Saturday, 23 August 2014, 0:53, Yin Huai huaiyin@gmail.com wrote: Hello Sankar, Add JAR in SQL is not supported at the moment. We are working on it (https://issues.apache.org/jira/browse/SPARK-2219). For now, can you try SparkContext.addJar or using --jars your-jar to launch spark shell? Thanks, Yin On Fri, Aug 22, 2014 at 2:01 PM, S Malligarjunan smalligarju...@yahoo.com wrote: Hello Yin/All. @Yin - Thanks for helping. I solved the sql parser error. I am getting the following exception now scala hiveContext.hql(ADD JAR s3n://hadoop.anonymous.com/lib/myudf.jar); warning: there were 1 deprecation warning(s); re-run with -deprecation for details 14/08/22 17:58:55 INFO SessionState: converting to local s3n://hadoop.anonymous.com/lib/myudf.jar 14/08/22 17:58:56 ERROR SessionState: Unable to register /tmp/3d273a4c-0494-4bec-80fe-86aa56f11684_resources/myudf.jar Exception: org.apache.spark.repl.SparkIMain$TranslatingClassLoader cannot be cast to java.net.URLClassLoader java.lang.ClassCastException: org.apache.spark.repl.SparkIMain$TranslatingClassLoader cannot be cast to java.net.URLClassLoader at org.apache.hadoop.hive.ql.exec.Utilities.addToClassPath(Utilities.java:1680) Thanks and Regards, Sankar S. On Friday, 22 August 2014, 22:53, S Malligarjunan smalligarju...@yahoo.com.INVALID wrote: Hello Yin, Forgot to mention one thing, the same query works fine in Hive and Shark.. Thanks and Regards, Sankar S. On , S Malligarjunan smalligarju...@yahoo.com wrote: Hello Yin, I have tried the create external table command as well. I get the same error. Please help me to find the root cause. Thanks and Regards, Sankar S. On Friday, 22 August 2014, 22:43, Yin Huai huaiyin@gmail.com wrote: Hi Sankar, You need to create an external table in order to specify the location of data (i.e. using CREATE EXTERNAL TABLE user1 LOCATION). You can take a look at this page for reference. Thanks, Yin On Thu, Aug 21, 2014 at 11:12 PM, S Malligarjunan smalligarju...@yahoo.com.invalid wrote: Hello All, When i execute the following query val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) CREATE TABLE user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/') I am getting the following error org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: CREATE TABLE user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LINES TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/') at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:215) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:98) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:102) at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22) at $iwC$$iwC$$iwC$$iwC.init(console:27) at $iwC$$iwC$$iwC.init(console:29) at $iwC$$iwC.init(console:31) at $iwC.init(console:33) at init(console:35) Kindly let me know what could be the issue here. I have cloned spark from github. Using Hadoop 1.0.3 Thanks and Regards, Sankar S.
Re: Spark SQL Parser error
Hello Yin, I have tried the create external table command as well. I get the same error. Please help me to find the root cause. Thanks and Regards, Sankar S. On Friday, 22 August 2014, 22:43, Yin Huai huaiyin@gmail.com wrote: Hi Sankar, You need to create an external table in order to specify the location of data (i.e. using CREATE EXTERNAL TABLE user1 LOCATION). You can take a look at this page for reference. Thanks, Yin On Thu, Aug 21, 2014 at 11:12 PM, S Malligarjunan smalligarju...@yahoo.com.invalid wrote: Hello All, When i execute the following query val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) CREATE TABLE user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/') I am getting the following error org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: CREATE TABLE user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LINES TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/') at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:215) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:98) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:102) at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22) at $iwC$$iwC$$iwC$$iwC.init(console:27) at $iwC$$iwC$$iwC.init(console:29) at $iwC$$iwC.init(console:31) at $iwC.init(console:33) at init(console:35) Kindly let me know what could be the issue here. I have cloned spark from github. Using Hadoop 1.0.3 Thanks and Regards, Sankar S.
Re: Spark SQL Parser error
Hello Yin, Forgot to mention one thing, the same query works fine in Hive and Shark.. Thanks and Regards, Sankar S. On , S Malligarjunan smalligarju...@yahoo.com wrote: Hello Yin, I have tried the create external table command as well. I get the same error. Please help me to find the root cause. Thanks and Regards, Sankar S. On Friday, 22 August 2014, 22:43, Yin Huai huaiyin@gmail.com wrote: Hi Sankar, You need to create an external table in order to specify the location of data (i.e. using CREATE EXTERNAL TABLE user1 LOCATION). You can take a look at this page for reference. Thanks, Yin On Thu, Aug 21, 2014 at 11:12 PM, S Malligarjunan smalligarju...@yahoo.com.invalid wrote: Hello All, When i execute the following query val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) CREATE TABLE user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/') I am getting the following error org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: CREATE TABLE user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LINES TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/') at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:215) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:98) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:102) at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22) at $iwC$$iwC$$iwC$$iwC.init(console:27) at $iwC$$iwC$$iwC.init(console:29) at $iwC$$iwC.init(console:31) at $iwC.init(console:33) at init(console:35) Kindly let me know what could be the issue here. I have cloned spark from github. Using Hadoop 1.0.3 Thanks and Regards, Sankar S.
Spark SQL Parser error
Hello All, When i execute the following query val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) CREATE TABLE user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/') I am getting the following error org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: CREATE TABLE user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LINES TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/') at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:215) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:98) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:102) at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22) at $iwC$$iwC$$iwC$$iwC.init(console:27) at $iwC$$iwC$$iwC.init(console:29) at $iwC$$iwC.init(console:31) at $iwC.init(console:33) at init(console:35) Kindly let me know what could be the issue here. I have cloned spark from github. Using Hadoop 1.0.3 Thanks and Regards, Sankar S.
Re: Need help on Spark UDF (Join) Performance tuning .
Hello Experts, Appreciate your input highly, please suggest/ give me hint, what would be the issue here? Thanks and Regards, Malligarjunan S. On Thursday, 17 July 2014, 22:47, S Malligarjunan smalligarju...@yahoo.com wrote: Hello Experts, I am facing performance problem when I use the UDF function call. Please help me to tune the query. Please find the details below shark select count(*) from table1; OK 151096 Time taken: 7.242 seconds shark select count(*) from table2; OK 938 Time taken: 1.273 seconds Without UDF: shark SELECT count(pvc1.time) FROM table2 pvc2 JOIN table1 pvc1 WHERE pvc1.col1 = pvc2.col2 AND unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS') unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS'); OK 328 Time taken: 200.487 seconds shark SELECT count(pvc1.time) FROM table2 pvc2 JOIN table1 pvc1 WHERE (pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2) AND unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS') unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS'); OK 331 Time taken: 292.86 seconds With UDF: shark SELECT count(pvc1.time) FROM table2 pvc2 JOIN table1 pvc1 WHERE testCompare(pvc1.col1, pvc1.col2, pvc2.col1,pvc2.col2) AND unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS') unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS'); OK 331 Time taken: 3718.23 seconds The above UDF query takes more time to run. Where testCompare is an udf function, The function just does the pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2 Please let me know what is the issue here? Thanks and Regards, Sankar S.
Need help on Spark UDF (Join) Performance tuning .
Hello Experts, I am facing performance problem when I use the UDF function call. Please help me to tune the query. Please find the details below shark select count(*) from table1; OK 151096 Time taken: 7.242 seconds shark select count(*) from table2; OK 938 Time taken: 1.273 seconds Without UDF: shark SELECT count(pvc1.time) FROM table2 pvc2 JOIN table1 pvc1 WHERE pvc1.col1 = pvc2.col2 AND unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS') unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS'); OK 328 Time taken: 200.487 seconds shark SELECT count(pvc1.time) FROM table2 pvc2 JOIN table1 pvc1 WHERE (pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2) AND unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS') unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS'); OK 331 Time taken: 292.86 seconds With UDF: shark SELECT count(pvc1.time) FROM table2 pvc2 JOIN table1 pvc1 WHERE testCompare(pvc1.col1,pvc1.col2, pvc2.col1,pvc2.col2) AND unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS') unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS'); OK 331 Time taken: 3718.23 seconds The above UDF query takes more time to run. Where testCompare is an udf function, The function just does the pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2 Please let me know what is the issue here? Thanks and Regards, Sankar S.