from:"ram kumar"

Error starting HiveServer2: could not start ThriftBinaryCLIService

2016-07-15 Thread ram kumar

Hi all,

I started Hive Thrift Server with command,

/sbin/start-thriftserver.sh --master yarn -hiveconf
hive.server2.thrift.port 10003

The Thrift server started at the particular node without any error.


When doing the same, except pointing to different node to start the server,

./sbin/start-thriftserver.sh --master yarn --hiveconf
hive.server2.thrift.bind.host
DIFFERENT_NODE_IP --hiveconf hive.server2.thrift.port 10003

I am getting following error,

16/07/15 13:04:35 INFO service.AbstractService:
Service:ThriftBinaryCLIService is started.
16/07/15 13:04:35 INFO service.AbstractService: Service:HiveServer2 is
started.
16/07/15 13:04:35 INFO thriftserver.HiveThriftServer2: HiveThriftServer2
started
16/07/15 13:04:36 ERROR thrift.ThriftCLIService: Error:
org.apache.thrift.transport.TTransportException: Could not create
ServerSocket on address DIFFERENT_NODE_IP:10003.
at
org.apache.thrift.transport.TServerSocket.(TServerSocket.java:93)
at
org.apache.thrift.transport.TServerSocket.(TServerSocket.java:79)
at
org.apache.hive.service.auth.HiveAuthFactory.getServerSocket(HiveAuthFactory.java:236)
at
org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:69)
at java.lang.Thread.run(Thread.java:745)

Can anyone help me with this?

Thanks,
Ram.

Re: Getting error in inputfile | inputFile

2016-07-15 Thread ram kumar

check the "*inputFile*" variable name
lol

On Fri, Jul 15, 2016 at 12:12 PM, RK Spark  wrote:

> I am using Spark version is 1.5.1, I am getting errors in first program of
> spark,ie.e., word count. Please help me to solve this
>
> *scala> val inputfile = sc.textFile("input.txt")*
> *inputfile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[13] at
> textFile at :21*
>
> *scala> val counts = inputFile.flatMap(line => line.split(" ")).map(word
> => (word,1)).reduceByKey(_ + _);*
> *:19: error: not found: value inputFile*
> *   val counts = inputFile.flatMap(line => line.split(" ")).map(word
> => (word,1)).reduceByKey(_ + _);*
> *^*
>
>

Re: Spark with HBase Error - Py4JJavaError

2016-07-07 Thread ram kumar

Hi Puneet,

Have you tried appending
 --jars $SPARK_HOME/lib/spark-examples-*.jar
to the execution command?

Ram

On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi <
puneet.tripa...@dunnhumby.com> wrote:

> Guys, Please can anyone help on the issue below?
>
>
>
> Puneet
>
>
>
> *From:* Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com]
> *Sent:* Thursday, July 07, 2016 12:42 PM
> *To:* user@spark.apache.org
> *Subject:* Spark with HBase Error - Py4JJavaError
>
>
>
> Hi,
>
>
>
> We are running Hbase in fully distributed mode. I tried to connect to
> Hbase via pyspark and then write to hbase using *saveAsNewAPIHadoopDataset
> *, but it failed the error says:
>
>
>
> Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
>
> : java.lang.ClassNotFoundException:
> org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>
> I have been able to create pythonconverters.jar and then did below:
>
>
>
> 1.  I think we have to copy this to a location on HDFS, /sparkjars/
> seems a good a directory to create as any. I think the file has to be world
> readable
>
> 2.  Set the spark_jar_hdfs_path property in Cloudera Manager e.g.
> hdfs:///sparkjars
>
>
>
> It still doesn’t seem to work can someone please help me with this.
>
>
>
> Regards,
>
> Puneet
>
> dunnhumby limited is a limited company registered in England and Wales
> with registered number 02388853 and VAT registered number 927 5871 83. Our
> registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL.
> The contents of this message and any attachments to it are confidential and
> may be legally privileged. If you have received this message in error you
> should delete it from your system immediately and advise the sender.
> dunnhumby may monitor and record all emails. The views expressed in this
> email are those of the sender and not those of dunnhumby.
> dunnhumby limited is a limited company registered in England and Wales
> with registered number 02388853 and VAT registered number 927 5871 83. Our
> registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL.
> The contents of this message and any attachments to it are confidential and
> may be legally privileged. If you have received this message in error you
> should delete it from your system immediately and advise the sender.
> dunnhumby may monitor and record all emails. The views expressed in this
> email are those of the sender and not those of dunnhumby.
>

Re: Error joining dataframes

2016-05-18 Thread ram kumar

I tried it,

eg:
 df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter")

+++++

|  id|   A|  id|   B|

+++++

|   1|   0|null|null|

|   2|   0|   2|   0|

|null|null|   3|   0|

+++++


if I try,
df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter").drop(df1("Id"))



++++

|   A|  id|   B|

++++

|   0|null|null|

|   0|   2|   0|

|null|   3|   0|

++++

The "id" = 1 will be lost

On Wed, May 18, 2016 at 1:52 PM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"),
> "fullouter").drop(df1("Id"))
> On May 18, 2016 2:16 PM, "ram kumar" <ramkumarro...@gmail.com> wrote:
>
> I tried
>
> scala> var df_join = df1.join(df2, "Id", "fullouter")
> :27: error: type mismatch;
>  found   : String("Id")
>  required: org.apache.spark.sql.Column
>var df_join = df1.join(df2, "Id", "fullouter")
>^
>
> scala>
>
> And I cant see the above method in
>
> https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)
>
> On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bkpat...@mtu.edu>
> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>

Re: Error joining dataframes

2016-05-18 Thread ram kumar

When you register a temp table from the dataframe

eg:
var df_join = df1.join(df2, df1("id") === df2("id"), "outer")
df_join.registerTempTable("test")

sqlContext.sql("select * from test")

+++++

|  id|   A|  id|   B|

+++++

|   1|   0|null|null|

|   2|   0|   2|   0|

|null|null|   3|   0|

+++++


but, when you query the "id"


sqlContext.sql("select id from test")

*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)

On Wed, May 18, 2016 at 12:44 PM, Takeshi Yamamuro <linguin@gmail.com>
wrote:

> Look weird, seems spark-v1.5.x can accept the query.
> What's the difference between the example and your query?
>
> 
>
> Welcome to
>
>     __
>
>  / __/__  ___ _/ /__
>
> _\ \/ _ \/ _ `/ __/  '_/
>
>/___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>
>   /_/
>
> scala> :paste
>
> // Entering paste mode (ctrl-D to finish)
>
> val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")
>
> val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")
>
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
>
> // Exiting paste mode, now interpreting.
>
>
> +++++
>
> |  id|   A|  id|   B|
>
> +++++
>
> |   1|   0|null|null|
>
> |   2|   0|   2|   0|
>
> |null|null|   3|   0|
>
> +++++
>
>
> df1: org.apache.spark.sql.DataFrame = [id: int, A: int]
>
> df2: org.apache.spark.sql.DataFrame = [id: int, B: int]
>
>
>
>
>
> On Wed, May 18, 2016 at 3:52 PM, ram kumar <ramkumarro...@gmail.com>
> wrote:
>
>> I tried
>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>
>> But there is a duplicate "id" and when I query the "id", I get
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>
>> I am currently using spark 1.5.2.
>> Is there any alternative way in 1.5
>>
>> Thanks
>>
>> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin@gmail.com
>> > wrote:
>>
>>> Also, you can pass the query that you'd like to use in spark-v1.6+;
>>>
>>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
>>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
>>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>>
>>> // maropu
>>>
>>>
>>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarro...@gmail.com>
>>> wrote:
>>>
>>>> If I run as
>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>
>>>> It takes as inner join.
>>>>
>>>>
>>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> pretty simple, a similar construct to tables projected as DF
>>>>>
>>>>> val c =
>>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>>> val t =
>>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Try this one:
>>>>>>
>>>>>>
>>>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>>>
>>>>>> Thanks,
>>>>>> Bijay
>>>>>>
>>>>>>
>>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I tried to join two dataframe
>>>>>>>
>>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>>>
>>>>>>> df_join.registerTempTable("join_test")
>>>>>>>
>>>>>>>
>>>>>>> When querying "Id" from "join_test"
>>>>>>>
>>>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>>>> 0: jdbc:hive2://>
>>>>>>>
>>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into
>>>>>>> one "Id"
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Error joining dataframes

2016-05-18 Thread ram kumar

I tried
df1.join(df2, df1("id") === df2("id"), "outer").show

But there is a duplicate "id" and when I query the "id", I get
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)

I am currently using spark 1.5.2.
Is there any alternative way in 1.5

Thanks

On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin@gmail.com>
wrote:

> Also, you can pass the query that you'd like to use in spark-v1.6+;
>
> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
> // maropu
>
>
> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarro...@gmail.com>
> wrote:
>
>> If I run as
>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>
>> It takes as inner join.
>>
>>
>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> pretty simple, a similar construct to tables projected as DF
>>>
>>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>> val t =
>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> Try this one:
>>>>
>>>>
>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>
>>>> Thanks,
>>>> Bijay
>>>>
>>>>
>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I tried to join two dataframe
>>>>>
>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>
>>>>> df_join.registerTempTable("join_test")
>>>>>
>>>>>
>>>>> When querying "Id" from "join_test"
>>>>>
>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>> 0: jdbc:hive2://>
>>>>>
>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>>>> "Id"
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Error joining dataframes

2016-05-18 Thread ram kumar

If I run as
val rs = s.join(t,"time_id").join(c,"channel_id")

It takes as inner join.


On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> pretty simple, a similar construct to tables projected as DF
>
> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
> val rs = s.join(t,"time_id").join(c,"channel_id")
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>

Re: Error joining dataframes

2016-05-18 Thread ram kumar

I tried

scala> var df_join = df1.join(df2, "Id", "fullouter")
:27: error: type mismatch;
 found   : String("Id")
 required: org.apache.spark.sql.Column
   var df_join = df1.join(df2, "Id", "fullouter")
   ^

scala>

And I cant see the above method in
https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)

On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bkpat...@mtu.edu>
wrote:

> Hi,
>
> Try this one:
>
>
> df_join = df1.*join*(df2, 'Id', "fullouter")
>
> Thanks,
> Bijay
>
>
> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried to join two dataframe
>>
>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>
>> df_join.registerTempTable("join_test")
>>
>>
>> When querying "Id" from "join_test"
>>
>> 0: jdbc:hive2://> *select Id from join_test;*
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>> 0: jdbc:hive2://>
>>
>> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>>
>> Thanks
>>
>
>

Error joining dataframes

2016-05-17 Thread ram kumar

Hi,

I tried to join two dataframe

df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")

df_join.registerTempTable("join_test")


When querying "Id" from "join_test"

0: jdbc:hive2://> *select Id from join_test;*
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
0: jdbc:hive2://>

Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"

Thanks

Fwd: AuthorizationException while exposing via JDBC client (beeline)

2016-04-28 Thread ram kumar

Hi,

I wrote a spark job which registers a temp table
and when I expose it via beeline (JDBC client)

$ *./bin/beeline*
beeline>
* !connect jdbc:hive2://IP:10003 -n ram -p *0: jdbc:hive2://IP>






*show
tables;+-+--+-+|
tableName  | isTemporary
|+-+--+-+|
f238| true

|+-+--+-+2
rows selected (0.309 seconds)*0: jdbc:hive2://IP>

I can view the table. When querying I get this error message

0: jdbc:hive2://IP> select * from f238;
*Error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
User: ram is not allowed to impersonate ram (state=,code=0)*
0: jdbc:hive2://IP>

I have this in hive-site.xml,


  hive.metastore.sasl.enabled
  false
  If true, the metastore Thrift interface will be secured
with SASL. Clients must authenticate with Kerberos.



  hive.server2.enable.doAs
  false



  hive.server2.authentication
  NONE



I have this in core-site.xml,


  hadoop.proxyuser.hive.groups
  *



  hadoop.proxyuser.hive.hosts
  *



When persisting as a table using saveAsTable, I can able to query via
beeline
Any idea what configuration I am missing?

Thanks

AuthorizationException while exposing via JDBC client (beeline)

2016-04-27 Thread ram kumar

Hi,

I wrote a spark job which registers a temp table
and when I expose it via beeline (JDBC client)

$ *./bin/beeline*
beeline>
* !connect jdbc:hive2://IP:10003 -n ram -p *0: jdbc:hive2://IP>






*show
tables;+-+--+-+|
tableName  | isTemporary
|+-+--+-+|
f238| true

|+-+--+-+2
rows selected (0.309 seconds)*0: jdbc:hive2://IP>

I can view the table. When querying I get this error message

0: jdbc:hive2://IP> select * from f238;
*Error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
User: ram is not allowed to impersonate ram (state=,code=0)*
0: jdbc:hive2://IP>

I have this in hive-site.xml,


  hive.metastore.sasl.enabled
  false
  If true, the metastore Thrift interface will be secured
with SASL. Clients must authenticate with Kerberos.



  hive.server2.enable.doAs
  false



  hive.server2.authentication
  NONE



I have this in core-site.xml,


  hadoop.proxyuser.hive.groups
  *



  hadoop.proxyuser.hive.hosts
  *



When persisting as a table using saveAsTable, I can able to query via
beeline
Any idea what configuration I am missing?

Thanks

How to stop hivecontext

2016-04-15 Thread ram kumar

Hi,
I started hivecontext as,

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);

I want to stop this sql context

Thanks

Exposing temp table via Hive Thrift server

2016-04-14 Thread ram kumar

Hi,

In spark-shell (scala), we import,
*org.apache.spark.sql.hive.thriftserver._*
for starting Hive Thrift server programatically for particular hive context
as
*HiveThriftServer2.startWithContext(hiveContext)*
to expose registered temp table for that particular session.

We used pyspark for creating dataframe
Is there package on python for importing HiveThriftServer

Thanks

Importing hive thrift server

2016-04-12 Thread ram kumar

Hi,

In spark-shell, we start hive thrift server by importing,
import org.apache.spark.sql.hive.thriftserver._

Is there a package for importing it from pyspark

Thanks

Re: Could not load shims in class org.apache.hadoop.hive.schshim.FairSchedulerShim

2016-04-05 Thread ram kumar

I am facing this same issue.
Can any1 help me with this

Thanks

On Mon, Dec 7, 2015 at 9:14 AM, Shige Song  wrote:

> Hard to tell.
>
> On Mon, Dec 7, 2015 at 11:35 AM, zhangjp <592426...@qq.com> wrote:
>
>> Hi all,
>>
>>  I'm using saprk prebuild version 1.5.2+hadoop2.6 and hadoop version is
>> 2.6.2,  when i use java client jdbc to execute sql,there has some issues.
>>
>> java.lang.RuntimeException: Could not load shims in class
>> org.apache.hadoop.hive.schshim.FairSchedulerShim
>> at
>> org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:149)
>> at
>> org.apache.hadoop.hive.shims.ShimLoader.getSchedulerShims(ShimLoader.java:133)
>> at
>> org.apache.hadoop.hive.shims.Hadoop23Shims.refreshDefaultQueue(Hadoop23Shims.java:296)
>> at
>> org.apache.hive.service.cli.session.HiveSessionImpl.(HiveSessionImpl.java:109)
>> at
>> org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:253)
>> at
>> org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.openSession(SparkSQLSessionManager.scala:65)
>> at
>> org.apache.hive.service.cli.CLIService.openSession(CLIService.java:194)
>> at
>> org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:405)
>> at
>> org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:297)
>> at
>> org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253)
>> at
>> org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238)
>> at
>> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>> at
>> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>> at
>> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>> at
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.schshim.FairSchedulerShim
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:195)
>> at
>> org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:146)
>> ... 17 more
>>
>>
>
>

Re: Can't able to access temp table via jdbc client

2016-04-05 Thread ram kumar

Thanks for you input.

But, the jdbc client should be something like this,

{{{
$ *./bin/beeline*
Beeline version 1.5.2 by Apache Hive
beeline>*!connect jdbc:hive2://ip:1*
*show tables;*
++--+--+
| tableName  | isTemporary  |
++--+--+
| check  | false|
++--+--+
5 rows selected (0.126 seconds)
>
}}}
The isTemporary column is not seen.
I suppose the thrift server should be started from spark home

On Tue, Apr 5, 2016 at 1:08 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi
>
> temp tables are session specific and private to the session. You will not
> be able to see temp tables created by another session in HiveContext.
>
> Likewise creating a table in Hive using a syntax similar to below
>
> CREATE TEMPORARY TABLE tmp AS
> SELECT t.calendar_month_desc, c.channel_desc, SUM(s.amount_sold) AS
> TotalSales
>
> will only be visible to that session and is created under /tmp/hive/hduser.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 5 April 2016 at 05:52, ram kumar <ramkumarro...@gmail.com> wrote:
>
>> HI,
>>
>> I started a hive thrift server from hive home,
>> ./bin/hiveserver2
>>
>> opened jdbc client,
>> ./bin/beeline
>> connected to thrift server,
>>
>> 0: > show tables;
>> ++--+
>> |tab_name|
>> ++--+
>> | check  |
>> | people |
>> ++--+
>> 4 rows selected (2.085 seconds)
>> 0:>
>>
>> I can't see the temp tables which i registered with HiveContext
>> from pyspark.sql import HiveContext, Row
>> schemaPeople.registerTempTable("testtemp")
>> Can some1 help me with it
>>
>> Thanks
>>
>>
>

Exposing dataframe via thrift server

2016-03-30 Thread ram kumar

Hi,

I started thrift server
cd $SPARK_HOME
./sbin/start-thriftserver.sh

Then, jdbc client
$ ./bin/beeline
Beeline version 1.5.2 by Apache Hive
beeline>!connect jdbc:hive2://ip:1
show tables;
++--+--+
| tableName  | isTemporary  |
++--+--+
| check  | false|
| test   | false|
++--+--+
5 rows selected (0.126 seconds)
>

It shows table that are persisted on hive metastore using saveAsTable.
Temp table (registerTempTable) can't able to view

Can any1 help me with this,
Thanks

exception while running job as pyspark

2016-03-16 Thread ram kumar

Hi,

I get the following error when running a job as pyspark,

{{{
 An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
0.0 (TID 6, ): java.io.IOException: Cannot run program "python2.7":
error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:160)
at
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
at
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 36 more

}}}

python2.7 couldn't found. But i m using vertual env python 2.7
{{{
[ram@test-work workspace]$ python
Python 2.7.8 (default, Mar 15 2016, 04:37:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
}}}

Can anyone help me with this?
Thanks

Re: Doubt on data frame

2016-03-11 Thread ram kumar

No, I am not aware of it.

Can you provide me with the details regarding this.

Thanks

On Fri, Mar 11, 2016 at 8:25 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> temporary tables are associated with SessionState which is used
> by SQLContext.
>
> Did you keep the session ?
>
> Cheers
>
> On Fri, Mar 11, 2016 at 5:02 AM, ram kumar <ramkumarro...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I registered a dataframe as a table using registerTempTable
>> and I didn't close the Spark context.
>>
>> Will the table be available for longer time?
>>
>> Thanks
>>
>
>

Doubt on data frame

2016-03-11 Thread ram kumar

Hi,

I registered a dataframe as a table using registerTempTable
and I didn't close the Spark context.

Will the table be available for longer time?

Thanks

Re: spark streaming - checkpoint

2015-06-29 Thread ram kumar

on using yarn-cluster, it works good

On Mon, Jun 29, 2015 at 12:07 PM, ram kumar ramkumarro...@gmail.com wrote:

 SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/*
 in spark-env.sh

 I think i am facing the same issue
 https://issues.apache.org/jira/browse/SPARK-6203



 On Mon, Jun 29, 2015 at 11:38 AM, ram kumar ramkumarro...@gmail.com
 wrote:

 I am using Spark 1.2.0.2.2.0.0-82 (git revision de12451) built for Hadoop
 2.6.0.2.2.0.0-2041

 1) SPARK_CLASSPATH not set
 2) spark.executor.extraClassPath not set

 should I upgrade my version to 1.3 and check

 On Sat, Jun 27, 2015 at 1:07 PM, Tathagata Das t...@databricks.com
 wrote:

 Do you have SPARK_CLASSPATH set in both cases? Before and after
 checkpoint? If yes, then you should not be using SPARK_CLASSPATH, it has
 been deprecated since Spark 1.0 because of its ambiguity.
 Also where do you have spark.executor.extraClassPath set? I dont see it
 in the spark-submit command.

 On Fri, Jun 26, 2015 at 6:05 AM, ram kumar ramkumarro...@gmail.com
 wrote:

 Hi,

 -

 JavaStreamingContext ssc = new JavaStreamingContext(conf, new
 Duration(1));
 ssc.checkpoint(checkPointDir);

 JavaStreamingContextFactory factory = new JavaStreamingContextFactory()
 {
 public JavaStreamingContext create() {
 return createContext(checkPointDir, outputDirectory);
 }

 };
 JavaStreamingContext ssc =
 JavaStreamingContext.getOrCreate(checkPointDir, factory);

 

 *first time, i run this. It work fine.*

 *but, second time. it shows following error.*
 *i deleted the checkpoint path and then it works.*

 ---
 [user@h7 ~]$ spark-submit --jars /home/user/examples-spark-jar.jar
 --conf spark.driver.allowMultipleContexts=true --class com.spark.Pick
 --master yarn-client --num-executors 10 --executor-cores 1 SNAPSHOT.jar
 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath
 2015-06-26 12:43:42,981 WARN  [main] util.NativeCodeLoader
 (NativeCodeLoader.java:clinit(62)) - Unable to load native-hadoop library
 for your platform... using builtin-java classes where applicable
 2015-06-26 12:43:44,246 WARN  [main] shortcircuit.DomainSocketFactory
 (DomainSocketFactory.java:init(116)) - The short-circuit local reads
 feature cannot be used because libhadoop cannot be loaded.

 This is deprecated in Spark 1.0+.

 Please instead use:
  - ./spark-submit with --driver-class-path to augment the driver
 classpath
  - spark.executor.extraClassPath to augment the executor classpath

 Exception in thread main org.apache.spark.SparkException: Found both
 spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
 at
 org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:334)
 at
 org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:332)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at
 org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:332)
 at
 org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:320)
 at scala.Option.foreach(Option.scala:236)
 at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:320)
 at org.apache.spark.SparkContext.init(SparkContext.scala:178)
 at
 org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:118)
 at
 org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561)
 at
 org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561)
 at scala.Option.map(Option.scala:145)
 at
 org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561)
 at
 org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:566)
 at
 org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
 at
 com.orzota.kafka.kafka.TotalPicsWithScore.main(TotalPicsWithScore.java:159)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:360)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:76)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 [user@h7 ~]

 --

 *can anyone help me with it*


 *thanks*

Re: spark streaming - checkpoint

2015-06-29 Thread ram kumar

SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/*
in spark-env.sh

I think i am facing the same issue
https://issues.apache.org/jira/browse/SPARK-6203


On Mon, Jun 29, 2015 at 11:38 AM, ram kumar ramkumarro...@gmail.com wrote:

 I am using Spark 1.2.0.2.2.0.0-82 (git revision de12451) built for Hadoop
 2.6.0.2.2.0.0-2041

 1) SPARK_CLASSPATH not set
 2) spark.executor.extraClassPath not set

 should I upgrade my version to 1.3 and check

 On Sat, Jun 27, 2015 at 1:07 PM, Tathagata Das t...@databricks.com
 wrote:

 Do you have SPARK_CLASSPATH set in both cases? Before and after
 checkpoint? If yes, then you should not be using SPARK_CLASSPATH, it has
 been deprecated since Spark 1.0 because of its ambiguity.
 Also where do you have spark.executor.extraClassPath set? I dont see it
 in the spark-submit command.

 On Fri, Jun 26, 2015 at 6:05 AM, ram kumar ramkumarro...@gmail.com
 wrote:

 Hi,

 -

 JavaStreamingContext ssc = new JavaStreamingContext(conf, new
 Duration(1));
 ssc.checkpoint(checkPointDir);

 JavaStreamingContextFactory factory = new JavaStreamingContextFactory() {
 public JavaStreamingContext create() {
 return createContext(checkPointDir, outputDirectory);
 }

 };
 JavaStreamingContext ssc =
 JavaStreamingContext.getOrCreate(checkPointDir, factory);

 

 *first time, i run this. It work fine.*

 *but, second time. it shows following error.*
 *i deleted the checkpoint path and then it works.*

 ---
 [user@h7 ~]$ spark-submit --jars /home/user/examples-spark-jar.jar
 --conf spark.driver.allowMultipleContexts=true --class com.spark.Pick
 --master yarn-client --num-executors 10 --executor-cores 1 SNAPSHOT.jar
 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath
 2015-06-26 12:43:42,981 WARN  [main] util.NativeCodeLoader
 (NativeCodeLoader.java:clinit(62)) - Unable to load native-hadoop library
 for your platform... using builtin-java classes where applicable
 2015-06-26 12:43:44,246 WARN  [main] shortcircuit.DomainSocketFactory
 (DomainSocketFactory.java:init(116)) - The short-circuit local reads
 feature cannot be used because libhadoop cannot be loaded.

 This is deprecated in Spark 1.0+.

 Please instead use:
  - ./spark-submit with --driver-class-path to augment the driver
 classpath
  - spark.executor.extraClassPath to augment the executor classpath

 Exception in thread main org.apache.spark.SparkException: Found both
 spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
 at
 org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:334)
 at
 org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:332)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at
 org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:332)
 at
 org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:320)
 at scala.Option.foreach(Option.scala:236)
 at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:320)
 at org.apache.spark.SparkContext.init(SparkContext.scala:178)
 at
 org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:118)
 at
 org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561)
 at
 org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561)
 at scala.Option.map(Option.scala:145)
 at
 org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561)
 at
 org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:566)
 at
 org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
 at
 com.orzota.kafka.kafka.TotalPicsWithScore.main(TotalPicsWithScore.java:159)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:360)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:76)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 [user@h7 ~]

 --

 *can anyone help me with it*


 *thanks*

spark streaming - checkpoint

2015-06-26 Thread ram kumar

Hi,

-

JavaStreamingContext ssc = new JavaStreamingContext(conf, new
Duration(1));
ssc.checkpoint(checkPointDir);

JavaStreamingContextFactory factory = new JavaStreamingContextFactory() {
public JavaStreamingContext create() {
return createContext(checkPointDir, outputDirectory);
}

};
JavaStreamingContext ssc =
JavaStreamingContext.getOrCreate(checkPointDir, factory);



*first time, i run this. It work fine.*

*but, second time. it shows following error.*
*i deleted the checkpoint path and then it works.*

---
[user@h7 ~]$ spark-submit --jars /home/user/examples-spark-jar.jar  --conf
spark.driver.allowMultipleContexts=true --class com.spark.Pick --master
yarn-client --num-executors 10 --executor-cores 1 SNAPSHOT.jar
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
2015-06-26 12:43:42,981 WARN  [main] util.NativeCodeLoader
(NativeCodeLoader.java:clinit(62)) - Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
2015-06-26 12:43:44,246 WARN  [main] shortcircuit.DomainSocketFactory
(DomainSocketFactory.java:init(116)) - The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.

This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath

Exception in thread main org.apache.spark.SparkException: Found both
spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
at
org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:334)
at
org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:332)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:332)
at
org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:320)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:320)
at org.apache.spark.SparkContext.init(SparkContext.scala:178)
at
org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:118)
at
org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561)
at
org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561)
at scala.Option.map(Option.scala:145)
at
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561)
at
org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:566)
at
org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at
com.orzota.kafka.kafka.TotalPicsWithScore.main(TotalPicsWithScore.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:360)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:76)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[user@h7 ~]

--

*can anyone help me with it*


*thanks*

Error starting HiveServer2: could not start ThriftBinaryCLIService

Re: Getting error in inputfile | inputFile

Re: Spark with HBase Error - Py4JJavaError

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Re: Error joining dataframes

Error joining dataframes

Fwd: AuthorizationException while exposing via JDBC client (beeline)

AuthorizationException while exposing via JDBC client (beeline)

How to stop hivecontext

Exposing temp table via Hive Thrift server

Importing hive thrift server

Re: Could not load shims in class org.apache.hadoop.hive.schshim.FairSchedulerShim

Re: Can't able to access temp table via jdbc client

Exposing dataframe via thrift server

exception while running job as pyspark

Re: Doubt on data frame

Doubt on data frame

Re: spark streaming - checkpoint

Re: spark streaming - checkpoint

spark streaming - checkpoint

23 matches

Site Navigation

Mail list logo

Footer information