Unsubscribe

2016-12-03 Thread S Malligarjunan
Unsubscribe Thanks and Regards,Malligarjunan S.  


Re: Unsubscribe

2016-12-03 Thread S Malligarjunan
Unsubscribe Thanks and Regards,Malligarjunan S.  
 

On Saturday, 3 December 2016, 20:42, Sivakumar S 
 wrote:
 

 Unsubscribe

   

Re: Spark 1.1. doesn't work with hive context

2014-08-27 Thread S Malligarjunan
It is my mistake, some how I have added the io.compression.codec property value 
as the above mentioned class. Resolved the problem now
 
Thanks and Regards,
Sankar S.  



On Wednesday, 27 August 2014, 1:23, S Malligarjunan smalligarju...@yahoo.com 
wrote:
 


Hello all,

I have just checked out branch-1.1 
and executed below command
./bin/spark-shell --driver-memory 1G
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 
hiveContext.hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) 
hiveContext.hql(LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
INTO TABLE src) // Queries are expressed in HiveQL hiveContext.hql(FROM src 
SELECT key, value).collect().foreach(println)

I am getting the following exception

Caused by: java.lang.IllegalArgumentException: Compression codec 
com.hadoop.compression.lzo.LzoCodec not found.
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:175)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 72 more
Caused by: java.lang.ClassNotFoundException: Class 
com.hadoop.compression.lzo.LzoCodec not found

 
Thanks and Regards,
Sankar S.  

Re: SPARK Hive Context UDF Class Not Found Exception,

2014-08-26 Thread S Malligarjunan
Hello Michel,

I have executed git pull now, As per pom, version entry it is 1.1.0-SNAPSHOT.

 
Thanks and Regards,
Sankar S.  



On Tuesday, 26 August 2014, 1:00, Michael Armbrust mich...@databricks.com 
wrote:
 


Which version of Spark SQL are you using?  Several issues with custom hive UDFs 
have been fixed in 1.1.



On Mon, Aug 25, 2014 at 9:57 AM, S Malligarjunan 
smalligarju...@yahoo.com.invalid wrote:

Hello All,


I have added a jar from S3 instance into classpath, i have tried following 
options
1. sc.addJar(s3n://mybucket/lib/myUDF.jar)
2. hiveContext.sparkContext.addJar(s3n://mybucket/lib/myUDF.jar)
3. ./bin/spark-shell --jars s3n://mybucket/lib/myUDF.jar


I am getting ClassNotException when trying to create a temporary function.


What would be the issue here?
 
Thanks and Regards,
Sankar S.  



Spark 1.1. doesn't work with hive context

2014-08-26 Thread S Malligarjunan
Hello all,

I have just checked out branch-1.1 
and executed below command
./bin/spark-shell --driver-memory 1G
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 
hiveContext.hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) 
hiveContext.hql(LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
INTO TABLE src) // Queries are expressed in HiveQL hiveContext.hql(FROM src 
SELECT key, value).collect().foreach(println)

I am getting the following exception

Caused by: java.lang.IllegalArgumentException: Compression codec 
com.hadoop.compression.lzo.LzoCodec not found.
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
at 
org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:175)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 72 more
Caused by: java.lang.ClassNotFoundException: Class 
com.hadoop.compression.lzo.LzoCodec not found

 
Thanks and Regards,
Sankar S.  


SPARK Hive Context UDF Class Not Found Exception,

2014-08-25 Thread S Malligarjunan
Hello All,

I have added a jar from S3 instance into classpath, i have tried following 
options
1. sc.addJar(s3n://mybucket/lib/myUDF.jar)
2. hiveContext.sparkContext.addJar(s3n://mybucket/lib/myUDF.jar)
3. ./bin/spark-shell --jars s3n://mybucket/lib/myUDF.jar

I am getting ClassNotException when trying to create a temporary function.

What would be the issue here?
 
Thanks and Regards,
Sankar S.  


Re: Spark SQL Parser error

2014-08-24 Thread S Malligarjunan
Hello Yin,

Additional note:
In ./bin/spark-shell --jars s3n:/mybucket/myudf.jar  I got the following 
message in console.
Waring skipped external jar..
 
Thanks and Regards,
Sankar S.  



On , S Malligarjunan smalligarju...@yahoo.com wrote:
 


Hello Yin,

I have tried use sc.addJar and hiveContext.sparkContext.addJar and 
./bin/spark-shell --jars option,

In all three option when I try to create temporary funtion i get the 
classNotFoundException. What would be the issue here?
 
Thanks and Regards,
Sankar S.  



On Saturday, 23 August 2014, 0:53, Yin Huai huaiyin@gmail.com wrote:
 


Hello Sankar,

Add JAR in SQL is not supported at the moment. We are working on it 
(https://issues.apache.org/jira/browse/SPARK-2219). For now, can you try 
SparkContext.addJar or using --jars your-jar to launch spark shell?

Thanks,

Yin 



On Fri, Aug 22, 2014 at 2:01 PM, S Malligarjunan smalligarju...@yahoo.com 
wrote:

Hello Yin/All.


@Yin - Thanks for helping. I solved the sql parser error. I am getting the 
following exception now


scala hiveContext.hql(ADD JAR s3n://hadoop.anonymous.com/lib/myudf.jar);
warning: there were 1 deprecation warning(s); re-run with -deprecation for 
details
14/08/22 17:58:55 INFO SessionState: converting to local 
s3n://hadoop.anonymous.com/lib/myudf.jar
14/08/22 17:58:56 ERROR SessionState: Unable to register 
/tmp/3d273a4c-0494-4bec-80fe-86aa56f11684_resources/myudf.jar
Exception: org.apache.spark.repl.SparkIMain$TranslatingClassLoader cannot be 
cast to java.net.URLClassLoader
java.lang.ClassCastException: 
org.apache.spark.repl.SparkIMain$TranslatingClassLoader cannot be cast to 
java.net.URLClassLoader
at org.apache.hadoop.hive.ql.exec.Utilities.addToClassPath(Utilities.java:1680)


 
Thanks and Regards,
Sankar S.  





On Friday, 22 August 2014, 22:53, S Malligarjunan 
smalligarju...@yahoo.com.INVALID wrote:
 


Hello Yin,


Forgot to mention one thing, the same query works fine in Hive and Shark..
 
Thanks and Regards,
Sankar S.  





On , S Malligarjunan smalligarju...@yahoo.com wrote:
 


Hello Yin,


I have tried  the create external table command as well. I get the same error.
Please help me to find the root cause.
 
Thanks and Regards,
Sankar S.  





On Friday, 22 August 2014, 22:43, Yin Huai huaiyin@gmail.com wrote:
 


Hi Sankar,


You need to create an external table in order to specify the location of data 
(i.e. using CREATE EXTERNAL TABLE user1  LOCATION).  You can take a look 
at this page for reference. 


Thanks,


Yin



On Thu, Aug 21, 2014 at 11:12 PM, S Malligarjunan 
smalligarju...@yahoo.com.invalid wrote:

Hello All,


When i execute the following query 




val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)


CREATE TABLE user1 (time string, id string, u_id string, c_ip string, 
user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES 
TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 
's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/')


I am getting the following error 
org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: CREATE 
TABLE user1 (time string, id string, u_id string, c_ip string, user_agent 
string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LINES TERMINATED BY '
' STORED AS TEXTFILE LOCATION 
's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/')
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:215)
at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:98)
at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:102)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC$$iwC$$iwC.init(console:27)
at $iwC$$iwC$$iwC.init(console:29)
at $iwC$$iwC.init(console:31)
at $iwC.init(console:33)
at init(console:35)


Kindly let me know what could be the issue here.


I have cloned spark from github. Using Hadoop 1.0.3 
 
Thanks and Regards,
Sankar S.  










Re: Spark SQL Parser error

2014-08-22 Thread S Malligarjunan
Hello Yin,

I have tried  the create external table command as well. I get the same error.
Please help me to find the root cause.
 
Thanks and Regards,
Sankar S.  



On Friday, 22 August 2014, 22:43, Yin Huai huaiyin@gmail.com wrote:
 


Hi Sankar,

You need to create an external table in order to specify the location of data 
(i.e. using CREATE EXTERNAL TABLE user1  LOCATION).  You can take a look at 
this page for reference. 

Thanks,

Yin



On Thu, Aug 21, 2014 at 11:12 PM, S Malligarjunan 
smalligarju...@yahoo.com.invalid wrote:

Hello All,


When i execute the following query 




val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)


CREATE TABLE user1 (time string, id string, u_id string, c_ip string, 
user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES 
TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 
's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/')


I am getting the following error 
org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: CREATE TABLE 
user1 (time string, id string, u_id string, c_ip string, user_agent string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LINES TERMINATED BY '
' STORED AS TEXTFILE LOCATION 
's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/')
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:215)
at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:98)
at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:102)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC$$iwC$$iwC.init(console:27)
at $iwC$$iwC$$iwC.init(console:29)
at $iwC$$iwC.init(console:31)
at $iwC.init(console:33)
at init(console:35)


Kindly let me know what could be the issue here.


I have cloned spark from github. Using Hadoop 1.0.3 
 
Thanks and Regards,
Sankar S.  



Re: Spark SQL Parser error

2014-08-22 Thread S Malligarjunan
Hello Yin,

Forgot to mention one thing, the same query works fine in Hive and Shark..
 
Thanks and Regards,
Sankar S.  



On , S Malligarjunan smalligarju...@yahoo.com wrote:
 


Hello Yin,

I have tried  the create external table command as well. I get the same error.
Please help me to find the root cause.
 
Thanks and Regards,
Sankar S.  



On Friday, 22 August 2014, 22:43, Yin Huai huaiyin@gmail.com wrote:
 


Hi Sankar,

You need to create an external table in order to specify the location of data 
(i.e. using CREATE EXTERNAL TABLE user1  LOCATION).  You can take a look at 
this page for reference. 

Thanks,

Yin



On Thu, Aug 21, 2014 at 11:12 PM, S Malligarjunan 
smalligarju...@yahoo.com.invalid wrote:

Hello All,


When i execute the following query 




val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)


CREATE TABLE user1 (time string, id string, u_id string, c_ip string, 
user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES 
TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 
's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/')


I am getting the following error 
org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: CREATE TABLE 
user1 (time string, id string, u_id string, c_ip string, user_agent string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LINES TERMINATED BY '
' STORED AS TEXTFILE LOCATION 
's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/')
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:215)
at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:98)
at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:102)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC$$iwC$$iwC.init(console:27)
at $iwC$$iwC$$iwC.init(console:29)
at $iwC$$iwC.init(console:31)
at $iwC.init(console:33)
at init(console:35)


Kindly let me know what could be the issue here.


I have cloned spark from github. Using Hadoop 1.0.3 
 
Thanks and Regards,
Sankar S.  



Spark SQL Parser error

2014-08-21 Thread S Malligarjunan
Hello All,

When i execute the following query 


val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

CREATE TABLE user1 (time string, id string, u_id string, c_ip string, 
user_agent string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES 
TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 
's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/')

I am getting the following error 
org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: CREATE TABLE 
user1 (time string, id string, u_id string, c_ip string, user_agent string) ROW 
FORMAT DELIMITED FIELDS TERMINATED BY '' LINES TERMINATED BY '
' STORED AS TEXTFILE LOCATION 
's3n://hadoop.anonymous.com/output/qa/cnv_px_ip_gnc/ds=2014-06-14/')
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:215)
at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:98)
at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:102)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC$$iwC$$iwC.init(console:27)
at $iwC$$iwC$$iwC.init(console:29)
at $iwC$$iwC.init(console:31)
at $iwC.init(console:33)
at init(console:35)

Kindly let me know what could be the issue here.

I have cloned spark from github. Using Hadoop 1.0.3 
 
Thanks and Regards,
Sankar S.  


Re: Need help on Spark UDF (Join) Performance tuning .

2014-07-18 Thread S Malligarjunan
Hello Experts,

Appreciate your input highly, please suggest/ give me hint, what would be the 
issue here?

 
Thanks and Regards,
Malligarjunan S.  



On Thursday, 17 July 2014, 22:47, S Malligarjunan smalligarju...@yahoo.com 
wrote:
 


Hello Experts,

I am facing performance problem when I use the UDF function call. Please help 
me to tune the query.
Please find the details below

shark select count(*) from table1;
OK
151096
Time taken: 7.242 seconds
shark select count(*) from table2; 
OK
938
Time taken: 1.273 seconds

Without UDF:
shark SELECT
 
 count(pvc1.time)
    FROM table2 pvc2 JOIN table1 pvc1
    WHERE pvc1.col1 = pvc2.col2
    AND  unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS')  
unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS');
OK
328
Time taken: 200.487 seconds


shark 
  SELECT
  count(pvc1.time)
        FROM table2 pvc2 JOIN table1 pvc1
    WHERE (pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2)
    AND  unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS')  
unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS');
OK
331
Time taken: 292.86 seconds

With UDF:
shark  
   SELECT
  count(pvc1.time)
    FROM table2 pvc2 JOIN table1 pvc1
    WHERE testCompare(pvc1.col1, pvc1.col2, pvc2.col1,pvc2.col2)
   
 AND  unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS')  
unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS');

OK
331
Time taken: 3718.23 seconds

The above UDF query takes more time to run. 


Where testCompare is an udf function, The function just does the pvc1.col1 = 
pvc2.col1 OR pvc1.col1 = pvc2.col2

Please let me know what is the issue here?

 
Thanks and Regards,
Sankar S.  

Need help on Spark UDF (Join) Performance tuning .

2014-07-17 Thread S Malligarjunan
Hello Experts,

I am facing performance problem when I use the UDF function call. Please help 
me to tune the query.
Please find the details below

shark select count(*) from table1;
OK
151096
Time taken: 7.242 seconds
shark select count(*) from table2; 
OK
938
Time taken: 1.273 seconds

Without UDF:
shark SELECT
  count(pvc1.time)
    FROM table2 pvc2 JOIN table1 pvc1
    WHERE pvc1.col1 = pvc2.col2
    AND  unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS')  
unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS');
OK
328
Time taken: 200.487 seconds


shark 
  SELECT
  count(pvc1.time)
        FROM table2 pvc2 JOIN table1 pvc1
    WHERE (pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2)
    AND  unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS')  
unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS');
OK
331
Time taken: 292.86 seconds

With UDF:
shark  
   SELECT
  count(pvc1.time)
    FROM table2 pvc2 JOIN table1 pvc1
    WHERE testCompare(pvc1.col1,pvc1.col2, pvc2.col1,pvc2.col2)
    AND  unix_timestamp(pvc2.time, '-MM-dd HH:mm:ss,SSS')  
unix_timestamp(pvc1.time, '-MM-dd HH:mm:ss,SSS');

OK
331
Time taken: 3718.23 seconds

The above UDF query takes more time to run. 


Where testCompare is an udf function, The function just does the pvc1.col1 = 
pvc2.col1 OR pvc1.col1 = pvc2.col2

Please let me know what is the issue here?

 
Thanks and Regards,
Sankar S.