Error starting HiveServer2: could not start ThriftBinaryCLIService
Hi all, I started Hive Thrift Server with command, /sbin/start-thriftserver.sh --master yarn -hiveconf hive.server2.thrift.port 10003 The Thrift server started at the particular node without any error. When doing the same, except pointing to different node to start the server, ./sbin/start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.bind.host DIFFERENT_NODE_IP --hiveconf hive.server2.thrift.port 10003 I am getting following error, 16/07/15 13:04:35 INFO service.AbstractService: Service:ThriftBinaryCLIService is started. 16/07/15 13:04:35 INFO service.AbstractService: Service:HiveServer2 is started. 16/07/15 13:04:35 INFO thriftserver.HiveThriftServer2: HiveThriftServer2 started 16/07/15 13:04:36 ERROR thrift.ThriftCLIService: Error: org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address DIFFERENT_NODE_IP:10003. at org.apache.thrift.transport.TServerSocket.(TServerSocket.java:93) at org.apache.thrift.transport.TServerSocket.(TServerSocket.java:79) at org.apache.hive.service.auth.HiveAuthFactory.getServerSocket(HiveAuthFactory.java:236) at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:69) at java.lang.Thread.run(Thread.java:745) Can anyone help me with this? Thanks, Ram.
Re: Getting error in inputfile | inputFile
check the "*inputFile*" variable name lol On Fri, Jul 15, 2016 at 12:12 PM, RK Sparkwrote: > I am using Spark version is 1.5.1, I am getting errors in first program of > spark,ie.e., word count. Please help me to solve this > > *scala> val inputfile = sc.textFile("input.txt")* > *inputfile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[13] at > textFile at :21* > > *scala> val counts = inputFile.flatMap(line => line.split(" ")).map(word > => (word,1)).reduceByKey(_ + _);* > *:19: error: not found: value inputFile* > * val counts = inputFile.flatMap(line => line.split(" ")).map(word > => (word,1)).reduceByKey(_ + _);* > *^* > >
Re: Spark with HBase Error - Py4JJavaError
Hi Puneet, Have you tried appending --jars $SPARK_HOME/lib/spark-examples-*.jar to the execution command? Ram On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi < puneet.tripa...@dunnhumby.com> wrote: > Guys, Please can anyone help on the issue below? > > > > Puneet > > > > *From:* Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com] > *Sent:* Thursday, July 07, 2016 12:42 PM > *To:* user@spark.apache.org > *Subject:* Spark with HBase Error - Py4JJavaError > > > > Hi, > > > > We are running Hbase in fully distributed mode. I tried to connect to > Hbase via pyspark and then write to hbase using *saveAsNewAPIHadoopDataset > *, but it failed the error says: > > > > Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. > > : java.lang.ClassNotFoundException: > org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter > > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > > I have been able to create pythonconverters.jar and then did below: > > > > 1. I think we have to copy this to a location on HDFS, /sparkjars/ > seems a good a directory to create as any. I think the file has to be world > readable > > 2. Set the spark_jar_hdfs_path property in Cloudera Manager e.g. > hdfs:///sparkjars > > > > It still doesn’t seem to work can someone please help me with this. > > > > Regards, > > Puneet > > dunnhumby limited is a limited company registered in England and Wales > with registered number 02388853 and VAT registered number 927 5871 83. Our > registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. > The contents of this message and any attachments to it are confidential and > may be legally privileged. If you have received this message in error you > should delete it from your system immediately and advise the sender. > dunnhumby may monitor and record all emails. The views expressed in this > email are those of the sender and not those of dunnhumby. > dunnhumby limited is a limited company registered in England and Wales > with registered number 02388853 and VAT registered number 927 5871 83. Our > registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. > The contents of this message and any attachments to it are confidential and > may be legally privileged. If you have received this message in error you > should delete it from your system immediately and advise the sender. > dunnhumby may monitor and record all emails. The views expressed in this > email are those of the sender and not those of dunnhumby. >
Re: Error joining dataframes
I tried it, eg: df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter") +++++ | id| A| id| B| +++++ | 1| 0|null|null| | 2| 0| 2| 0| |null|null| 3| 0| +++++ if I try, df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter").drop(df1("Id")) ++++ | A| id| B| ++++ | 0|null|null| | 0| 2| 0| |null| 3| 0| ++++ The "id" = 1 will be lost On Wed, May 18, 2016 at 1:52 PM, Divya Gehlot <divya.htco...@gmail.com> wrote: > Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"), > "fullouter").drop(df1("Id")) > On May 18, 2016 2:16 PM, "ram kumar" <ramkumarro...@gmail.com> wrote: > > I tried > > scala> var df_join = df1.join(df2, "Id", "fullouter") > :27: error: type mismatch; > found : String("Id") > required: org.apache.spark.sql.Column >var df_join = df1.join(df2, "Id", "fullouter") >^ > > scala> > > And I cant see the above method in > > https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String) > > On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bkpat...@mtu.edu> > wrote: > >> Hi, >> >> Try this one: >> >> >> df_join = df1.*join*(df2, 'Id', "fullouter") >> >> Thanks, >> Bijay >> >> >> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I tried to join two dataframe >>> >>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") >>> >>> df_join.registerTempTable("join_test") >>> >>> >>> When querying "Id" from "join_test" >>> >>> 0: jdbc:hive2://> *select Id from join_test;* >>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >>> 0: jdbc:hive2://> >>> >>> Is there a way to merge the value of df1("Id") and df2("Id") into one >>> "Id" >>> >>> Thanks >>> >> >> >
Re: Error joining dataframes
When you register a temp table from the dataframe eg: var df_join = df1.join(df2, df1("id") === df2("id"), "outer") df_join.registerTempTable("test") sqlContext.sql("select * from test") +++++ | id| A| id| B| +++++ | 1| 0|null|null| | 2| 0| 2| 0| |null|null| 3| 0| +++++ but, when you query the "id" sqlContext.sql("select id from test") *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) On Wed, May 18, 2016 at 12:44 PM, Takeshi Yamamuro <linguin@gmail.com> wrote: > Look weird, seems spark-v1.5.x can accept the query. > What's the difference between the example and your query? > > > > Welcome to > > __ > > / __/__ ___ _/ /__ > > _\ \/ _ \/ _ `/ __/ '_/ > >/___/ .__/\_,_/_/ /_/\_\ version 1.5.2 > > /_/ > > scala> :paste > > // Entering paste mode (ctrl-D to finish) > > val df1 = Seq((1, 0), (2, 0)).toDF("id", "A") > > val df2 = Seq((2, 0), (3, 0)).toDF("id", "B") > > df1.join(df2, df1("id") === df2("id"), "outer").show > > > // Exiting paste mode, now interpreting. > > > +++++ > > | id| A| id| B| > > +++++ > > | 1| 0|null|null| > > | 2| 0| 2| 0| > > |null|null| 3| 0| > > +++++ > > > df1: org.apache.spark.sql.DataFrame = [id: int, A: int] > > df2: org.apache.spark.sql.DataFrame = [id: int, B: int] > > > > > > On Wed, May 18, 2016 at 3:52 PM, ram kumar <ramkumarro...@gmail.com> > wrote: > >> I tried >> df1.join(df2, df1("id") === df2("id"), "outer").show >> >> But there is a duplicate "id" and when I query the "id", I get >> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >> >> I am currently using spark 1.5.2. >> Is there any alternative way in 1.5 >> >> Thanks >> >> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin@gmail.com >> > wrote: >> >>> Also, you can pass the query that you'd like to use in spark-v1.6+; >>> >>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A") >>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B") >>> df1.join(df2, df1("id") === df2("id"), "outer").show >>> >>> // maropu >>> >>> >>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarro...@gmail.com> >>> wrote: >>> >>>> If I run as >>>> val rs = s.join(t,"time_id").join(c,"channel_id") >>>> >>>> It takes as inner join. >>>> >>>> >>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> pretty simple, a similar construct to tables projected as DF >>>>> >>>>> val c = >>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") >>>>> val t = >>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") >>>>> val rs = s.join(t,"time_id").join(c,"channel_id") >>>>> >>>>> HTH >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Try this one: >>>>>> >>>>>> >>>>>> df_join = df1.*join*(df2, 'Id', "fullouter") >>>>>> >>>>>> Thanks, >>>>>> Bijay >>>>>> >>>>>> >>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I tried to join two dataframe >>>>>>> >>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") >>>>>>> >>>>>>> df_join.registerTempTable("join_test") >>>>>>> >>>>>>> >>>>>>> When querying "Id" from "join_test" >>>>>>> >>>>>>> 0: jdbc:hive2://> *select Id from join_test;* >>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >>>>>>> 0: jdbc:hive2://> >>>>>>> >>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into >>>>>>> one "Id" >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> --- >>> Takeshi Yamamuro >>> >> >> > > > -- > --- > Takeshi Yamamuro >
Re: Error joining dataframes
I tried df1.join(df2, df1("id") === df2("id"), "outer").show But there is a duplicate "id" and when I query the "id", I get *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) I am currently using spark 1.5.2. Is there any alternative way in 1.5 Thanks On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin@gmail.com> wrote: > Also, you can pass the query that you'd like to use in spark-v1.6+; > > val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A") > val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B") > df1.join(df2, df1("id") === df2("id"), "outer").show > > // maropu > > > On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarro...@gmail.com> > wrote: > >> If I run as >> val rs = s.join(t,"time_id").join(c,"channel_id") >> >> It takes as inner join. >> >> >> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> pretty simple, a similar construct to tables projected as DF >>> >>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") >>> val t = >>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") >>> val rs = s.join(t,"time_id").join(c,"channel_id") >>> >>> HTH >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote: >>> >>>> Hi, >>>> >>>> Try this one: >>>> >>>> >>>> df_join = df1.*join*(df2, 'Id', "fullouter") >>>> >>>> Thanks, >>>> Bijay >>>> >>>> >>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I tried to join two dataframe >>>>> >>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") >>>>> >>>>> df_join.registerTempTable("join_test") >>>>> >>>>> >>>>> When querying "Id" from "join_test" >>>>> >>>>> 0: jdbc:hive2://> *select Id from join_test;* >>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >>>>> 0: jdbc:hive2://> >>>>> >>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one >>>>> "Id" >>>>> >>>>> Thanks >>>>> >>>> >>>> >>> >> > > > -- > --- > Takeshi Yamamuro >
Re: Error joining dataframes
If I run as val rs = s.join(t,"time_id").join(c,"channel_id") It takes as inner join. On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > pretty simple, a similar construct to tables projected as DF > > val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") > val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") > val rs = s.join(t,"time_id").join(c,"channel_id") > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote: > >> Hi, >> >> Try this one: >> >> >> df_join = df1.*join*(df2, 'Id', "fullouter") >> >> Thanks, >> Bijay >> >> >> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I tried to join two dataframe >>> >>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") >>> >>> df_join.registerTempTable("join_test") >>> >>> >>> When querying "Id" from "join_test" >>> >>> 0: jdbc:hive2://> *select Id from join_test;* >>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >>> 0: jdbc:hive2://> >>> >>> Is there a way to merge the value of df1("Id") and df2("Id") into one >>> "Id" >>> >>> Thanks >>> >> >> >
Re: Error joining dataframes
I tried scala> var df_join = df1.join(df2, "Id", "fullouter") :27: error: type mismatch; found : String("Id") required: org.apache.spark.sql.Column var df_join = df1.join(df2, "Id", "fullouter") ^ scala> And I cant see the above method in https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String) On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bkpat...@mtu.edu> wrote: > Hi, > > Try this one: > > > df_join = df1.*join*(df2, 'Id', "fullouter") > > Thanks, > Bijay > > > On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarro...@gmail.com> > wrote: > >> Hi, >> >> I tried to join two dataframe >> >> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") >> >> df_join.registerTempTable("join_test") >> >> >> When querying "Id" from "join_test" >> >> 0: jdbc:hive2://> *select Id from join_test;* >> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is >> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) >> 0: jdbc:hive2://> >> >> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id" >> >> Thanks >> > >
Error joining dataframes
Hi, I tried to join two dataframe df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter") df_join.registerTempTable("join_test") When querying "Id" from "join_test" 0: jdbc:hive2://> *select Id from join_test;* *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0) 0: jdbc:hive2://> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id" Thanks
Fwd: AuthorizationException while exposing via JDBC client (beeline)
Hi, I wrote a spark job which registers a temp table and when I expose it via beeline (JDBC client) $ *./bin/beeline* beeline> * !connect jdbc:hive2://IP:10003 -n ram -p *0: jdbc:hive2://IP> *show tables;+-+--+-+| tableName | isTemporary |+-+--+-+| f238| true |+-+--+-+2 rows selected (0.309 seconds)*0: jdbc:hive2://IP> I can view the table. When querying I get this error message 0: jdbc:hive2://IP> select * from f238; *Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: ram is not allowed to impersonate ram (state=,code=0)* 0: jdbc:hive2://IP> I have this in hive-site.xml, hive.metastore.sasl.enabled false If true, the metastore Thrift interface will be secured with SASL. Clients must authenticate with Kerberos. hive.server2.enable.doAs false hive.server2.authentication NONE I have this in core-site.xml, hadoop.proxyuser.hive.groups * hadoop.proxyuser.hive.hosts * When persisting as a table using saveAsTable, I can able to query via beeline Any idea what configuration I am missing? Thanks
AuthorizationException while exposing via JDBC client (beeline)
Hi, I wrote a spark job which registers a temp table and when I expose it via beeline (JDBC client) $ *./bin/beeline* beeline> * !connect jdbc:hive2://IP:10003 -n ram -p *0: jdbc:hive2://IP> *show tables;+-+--+-+| tableName | isTemporary |+-+--+-+| f238| true |+-+--+-+2 rows selected (0.309 seconds)*0: jdbc:hive2://IP> I can view the table. When querying I get this error message 0: jdbc:hive2://IP> select * from f238; *Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: ram is not allowed to impersonate ram (state=,code=0)* 0: jdbc:hive2://IP> I have this in hive-site.xml, hive.metastore.sasl.enabled false If true, the metastore Thrift interface will be secured with SASL. Clients must authenticate with Kerberos. hive.server2.enable.doAs false hive.server2.authentication NONE I have this in core-site.xml, hadoop.proxyuser.hive.groups * hadoop.proxyuser.hive.hosts * When persisting as a table using saveAsTable, I can able to query via beeline Any idea what configuration I am missing? Thanks
How to stop hivecontext
Hi, I started hivecontext as, val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc); I want to stop this sql context Thanks
Exposing temp table via Hive Thrift server
Hi, In spark-shell (scala), we import, *org.apache.spark.sql.hive.thriftserver._* for starting Hive Thrift server programatically for particular hive context as *HiveThriftServer2.startWithContext(hiveContext)* to expose registered temp table for that particular session. We used pyspark for creating dataframe Is there package on python for importing HiveThriftServer Thanks
Importing hive thrift server
Hi, In spark-shell, we start hive thrift server by importing, import org.apache.spark.sql.hive.thriftserver._ Is there a package for importing it from pyspark Thanks
Re: Could not load shims in class org.apache.hadoop.hive.schshim.FairSchedulerShim
I am facing this same issue. Can any1 help me with this Thanks On Mon, Dec 7, 2015 at 9:14 AM, Shige Songwrote: > Hard to tell. > > On Mon, Dec 7, 2015 at 11:35 AM, zhangjp <592426...@qq.com> wrote: > >> Hi all, >> >> I'm using saprk prebuild version 1.5.2+hadoop2.6 and hadoop version is >> 2.6.2, when i use java client jdbc to execute sql,there has some issues. >> >> java.lang.RuntimeException: Could not load shims in class >> org.apache.hadoop.hive.schshim.FairSchedulerShim >> at >> org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:149) >> at >> org.apache.hadoop.hive.shims.ShimLoader.getSchedulerShims(ShimLoader.java:133) >> at >> org.apache.hadoop.hive.shims.Hadoop23Shims.refreshDefaultQueue(Hadoop23Shims.java:296) >> at >> org.apache.hive.service.cli.session.HiveSessionImpl.(HiveSessionImpl.java:109) >> at >> org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:253) >> at >> org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.openSession(SparkSQLSessionManager.scala:65) >> at >> org.apache.hive.service.cli.CLIService.openSession(CLIService.java:194) >> at >> org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:405) >> at >> org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:297) >> at >> org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253) >> at >> org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238) >> at >> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) >> at >> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) >> at >> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) >> at >> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.schshim.FairSchedulerShim >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:195) >> at >> org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:146) >> ... 17 more >> >> > >
Re: Can't able to access temp table via jdbc client
Thanks for you input. But, the jdbc client should be something like this, {{{ $ *./bin/beeline* Beeline version 1.5.2 by Apache Hive beeline>*!connect jdbc:hive2://ip:1* *show tables;* ++--+--+ | tableName | isTemporary | ++--+--+ | check | false| ++--+--+ 5 rows selected (0.126 seconds) > }}} The isTemporary column is not seen. I suppose the thrift server should be started from spark home On Tue, Apr 5, 2016 at 1:08 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi > > temp tables are session specific and private to the session. You will not > be able to see temp tables created by another session in HiveContext. > > Likewise creating a table in Hive using a syntax similar to below > > CREATE TEMPORARY TABLE tmp AS > SELECT t.calendar_month_desc, c.channel_desc, SUM(s.amount_sold) AS > TotalSales > > will only be visible to that session and is created under /tmp/hive/hduser. > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 5 April 2016 at 05:52, ram kumar <ramkumarro...@gmail.com> wrote: > >> HI, >> >> I started a hive thrift server from hive home, >> ./bin/hiveserver2 >> >> opened jdbc client, >> ./bin/beeline >> connected to thrift server, >> >> 0: > show tables; >> ++--+ >> |tab_name| >> ++--+ >> | check | >> | people | >> ++--+ >> 4 rows selected (2.085 seconds) >> 0:> >> >> I can't see the temp tables which i registered with HiveContext >> from pyspark.sql import HiveContext, Row >> schemaPeople.registerTempTable("testtemp") >> Can some1 help me with it >> >> Thanks >> >> >
Exposing dataframe via thrift server
Hi, I started thrift server cd $SPARK_HOME ./sbin/start-thriftserver.sh Then, jdbc client $ ./bin/beeline Beeline version 1.5.2 by Apache Hive beeline>!connect jdbc:hive2://ip:1 show tables; ++--+--+ | tableName | isTemporary | ++--+--+ | check | false| | test | false| ++--+--+ 5 rows selected (0.126 seconds) > It shows table that are persisted on hive metastore using saveAsTable. Temp table (registerTempTable) can't able to view Can any1 help me with this, Thanks
exception while running job as pyspark
Hi, I get the following error when running a job as pyspark, {{{ An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, ): java.io.IOException: Cannot run program "python2.7": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:160) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:248) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 36 more }}} python2.7 couldn't found. But i m using vertual env python 2.7 {{{ [ram@test-work workspace]$ python Python 2.7.8 (default, Mar 15 2016, 04:37:00) [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> }}} Can anyone help me with this? Thanks
Re: Doubt on data frame
No, I am not aware of it. Can you provide me with the details regarding this. Thanks On Fri, Mar 11, 2016 at 8:25 PM, Ted Yu <yuzhih...@gmail.com> wrote: > temporary tables are associated with SessionState which is used > by SQLContext. > > Did you keep the session ? > > Cheers > > On Fri, Mar 11, 2016 at 5:02 AM, ram kumar <ramkumarro...@gmail.com> > wrote: > >> Hi, >> >> I registered a dataframe as a table using registerTempTable >> and I didn't close the Spark context. >> >> Will the table be available for longer time? >> >> Thanks >> > >
Doubt on data frame
Hi, I registered a dataframe as a table using registerTempTable and I didn't close the Spark context. Will the table be available for longer time? Thanks
Re: spark streaming - checkpoint
on using yarn-cluster, it works good On Mon, Jun 29, 2015 at 12:07 PM, ram kumar ramkumarro...@gmail.com wrote: SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/* in spark-env.sh I think i am facing the same issue https://issues.apache.org/jira/browse/SPARK-6203 On Mon, Jun 29, 2015 at 11:38 AM, ram kumar ramkumarro...@gmail.com wrote: I am using Spark 1.2.0.2.2.0.0-82 (git revision de12451) built for Hadoop 2.6.0.2.2.0.0-2041 1) SPARK_CLASSPATH not set 2) spark.executor.extraClassPath not set should I upgrade my version to 1.3 and check On Sat, Jun 27, 2015 at 1:07 PM, Tathagata Das t...@databricks.com wrote: Do you have SPARK_CLASSPATH set in both cases? Before and after checkpoint? If yes, then you should not be using SPARK_CLASSPATH, it has been deprecated since Spark 1.0 because of its ambiguity. Also where do you have spark.executor.extraClassPath set? I dont see it in the spark-submit command. On Fri, Jun 26, 2015 at 6:05 AM, ram kumar ramkumarro...@gmail.com wrote: Hi, - JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(1)); ssc.checkpoint(checkPointDir); JavaStreamingContextFactory factory = new JavaStreamingContextFactory() { public JavaStreamingContext create() { return createContext(checkPointDir, outputDirectory); } }; JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkPointDir, factory); *first time, i run this. It work fine.* *but, second time. it shows following error.* *i deleted the checkpoint path and then it works.* --- [user@h7 ~]$ spark-submit --jars /home/user/examples-spark-jar.jar --conf spark.driver.allowMultipleContexts=true --class com.spark.Pick --master yarn-client --num-executors 10 --executor-cores 1 SNAPSHOT.jar Spark assembly has been built with Hive, including Datanucleus jars on classpath 2015-06-26 12:43:42,981 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:clinit(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-06-26 12:43:44,246 WARN [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:init(116)) - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --driver-class-path to augment the driver classpath - spark.executor.extraClassPath to augment the executor classpath Exception in thread main org.apache.spark.SparkException: Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former. at org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:334) at org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:332) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:332) at org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:320) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:320) at org.apache.spark.SparkContext.init(SparkContext.scala:178) at org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:118) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561) at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:566) at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala) at com.orzota.kafka.kafka.TotalPicsWithScore.main(TotalPicsWithScore.java:159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:360) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:76) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [user@h7 ~] -- *can anyone help me with it* *thanks*
Re: spark streaming - checkpoint
SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/* in spark-env.sh I think i am facing the same issue https://issues.apache.org/jira/browse/SPARK-6203 On Mon, Jun 29, 2015 at 11:38 AM, ram kumar ramkumarro...@gmail.com wrote: I am using Spark 1.2.0.2.2.0.0-82 (git revision de12451) built for Hadoop 2.6.0.2.2.0.0-2041 1) SPARK_CLASSPATH not set 2) spark.executor.extraClassPath not set should I upgrade my version to 1.3 and check On Sat, Jun 27, 2015 at 1:07 PM, Tathagata Das t...@databricks.com wrote: Do you have SPARK_CLASSPATH set in both cases? Before and after checkpoint? If yes, then you should not be using SPARK_CLASSPATH, it has been deprecated since Spark 1.0 because of its ambiguity. Also where do you have spark.executor.extraClassPath set? I dont see it in the spark-submit command. On Fri, Jun 26, 2015 at 6:05 AM, ram kumar ramkumarro...@gmail.com wrote: Hi, - JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(1)); ssc.checkpoint(checkPointDir); JavaStreamingContextFactory factory = new JavaStreamingContextFactory() { public JavaStreamingContext create() { return createContext(checkPointDir, outputDirectory); } }; JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkPointDir, factory); *first time, i run this. It work fine.* *but, second time. it shows following error.* *i deleted the checkpoint path and then it works.* --- [user@h7 ~]$ spark-submit --jars /home/user/examples-spark-jar.jar --conf spark.driver.allowMultipleContexts=true --class com.spark.Pick --master yarn-client --num-executors 10 --executor-cores 1 SNAPSHOT.jar Spark assembly has been built with Hive, including Datanucleus jars on classpath 2015-06-26 12:43:42,981 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:clinit(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-06-26 12:43:44,246 WARN [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:init(116)) - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --driver-class-path to augment the driver classpath - spark.executor.extraClassPath to augment the executor classpath Exception in thread main org.apache.spark.SparkException: Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former. at org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:334) at org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:332) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:332) at org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:320) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:320) at org.apache.spark.SparkContext.init(SparkContext.scala:178) at org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:118) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561) at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:566) at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala) at com.orzota.kafka.kafka.TotalPicsWithScore.main(TotalPicsWithScore.java:159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:360) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:76) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [user@h7 ~] -- *can anyone help me with it* *thanks*
spark streaming - checkpoint
Hi, - JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(1)); ssc.checkpoint(checkPointDir); JavaStreamingContextFactory factory = new JavaStreamingContextFactory() { public JavaStreamingContext create() { return createContext(checkPointDir, outputDirectory); } }; JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkPointDir, factory); *first time, i run this. It work fine.* *but, second time. it shows following error.* *i deleted the checkpoint path and then it works.* --- [user@h7 ~]$ spark-submit --jars /home/user/examples-spark-jar.jar --conf spark.driver.allowMultipleContexts=true --class com.spark.Pick --master yarn-client --num-executors 10 --executor-cores 1 SNAPSHOT.jar Spark assembly has been built with Hive, including Datanucleus jars on classpath 2015-06-26 12:43:42,981 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:clinit(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-06-26 12:43:44,246 WARN [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:init(116)) - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --driver-class-path to augment the driver classpath - spark.executor.extraClassPath to augment the executor classpath Exception in thread main org.apache.spark.SparkException: Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former. at org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:334) at org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$7.apply(SparkConf.scala:332) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:332) at org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:320) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:320) at org.apache.spark.SparkContext.init(SparkContext.scala:178) at org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:118) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561) at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:566) at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala) at com.orzota.kafka.kafka.TotalPicsWithScore.main(TotalPicsWithScore.java:159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:360) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:76) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [user@h7 ~] -- *can anyone help me with it* *thanks*