issue with regexp_replace
Hi Team, I am trying to use regexp_replace in spark sql it throwing error expected , but found Scalar in 'reader', line 9, column 45: ... select translate(payload, '"', '"') as payload i am trying to remove all character from \\\" with "
convert josn string in spark sql
Hi Team, I have kafka messages where json is coming as string how can create table after converting json string to json using spark sql
Re: Spark-hive integration on HDInsight
Hey jay How you are making your cluster are you using spark cluster All this thing should be set up automatically Sent from my iPhone > On Feb 21, 2019, at 12:12 PM, Felix Cheung wrote: > > You should check with HDInsight support > > From: Jay Singh > Sent: Wednesday, February 20, 2019 11:43:23 PM > To: User > Subject: Spark-hive integration on HDInsight > > I am trying to integrate spark with hive on HDInsight spark cluster . > I copied hive-site.xml in spark/conf directory. In addition I added hive > metastore properties like jdbc connection info on Ambari as well. But still > the database and tables created using spark-sql are not visible in hive. > Changed ‘spark.sql.warehouse.dir’ value also to point to hive warehouse > directory. > Spark does work with hive not having LLAP ON. What am I missing in the > configuration to integrate spark with hive ? Any pointer will be appreciated. > > thx
executing stored procedure through spark
Hi /team, The way we call java program to executed stored procedure is there any way we can achieve the same using pyspark
Re: Can we deploy python script on a spark cluster
Hi Lehak You can make a scala project with oozing class And one run class which will ship your python file to cluster Define oozie coordinator with spark action or shell action We are deploying pyspark based machine learning code Sent from my iPhone > On Aug 2, 2018, at 8:46 AM, Lehak Dharmani > wrote: > > > We are trying to deploy python script on spark cluster . However as per > documentations , it is not possible to deploy python applications on a > cluster . Is there any alternative > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Pyspark is not picking up correct python version on azure hdinsight
Hi Guys, Pyspark is not picking up correct python version on azure hdinsight property setup in spark2-env PYSPARK_PYTHON=${PYSPARK3_PYTHON:-/usr/bin/anaconda/envs/py35/bin/python3} export PYSPARK_DRIVER_PYTHON=${PYSPARK3_PYTHON:-/usr/bin/anaconda/envs/py35/bin/python3} Thanks
Re: help needed in perforance improvement of spark structured streaming
hi team any help with this I have a use case where i need to call stored procedure through structured streaming. I am able to send kafka message and call stored procedure , but since foreach sink keeps on executing stored procedure per message i want to combine all the messages in single dtaframe and then call stored procedure at once is it possible to do current code select('value cast "string",'topic) .select('topic,concat_ws(",", 'value cast "string") as 'value1) .groupBy('topic cast "string").count() .coalesce(1) .as[String] .writeStream .trigger(ProcessingTime("60 seconds")) .option("checkpointLocation", checkpointUrl) .foreach(new SimpleSqlServerSink(jdbcUrl, connectionProperties)) On Sat, May 5, 2018 at 12:20 PM, amit kumar singh wrote: > Hi Community, > > I have a use case where i need to call stored procedure through structured > streaming. > > I am able to send kafka message and call stored procedure , > > but since foreach sink keeps on executing stored procedure per message > > i want to combine all the messages in single dtaframe and then call > stored procedure at once > > is it possible to do > > > current code > > select('value cast "string",'topic) > .select('topic,concat_ws(",", 'value cast "string") as 'value1) > .groupBy('topic cast "string").count() > .coalesce(1) > .as[String] > .writeStream > .trigger(ProcessingTime("60 seconds")) > .option("checkpointLocation", checkpointUrl) > .foreach(new SimpleSqlServerSink(jdbcUrl, connectionProperties)) > > > > > thanks > rohit >
How can we group by messages coming in per batch of structured streaming
Hi Team, I have a requirement where i need to to combine all json messages coming in batch of structured streaming into one single json messages which can be separated by comma or any other delimiter and store it i have tried to group by kafka partition i tried using concat but its not working thanks Amit
help in copying data from one azure subscription to another azure subscription
HI Team, We are trying to move data between one azure subscription to another azure subscription is there a faster way to do through spark i am using distcp and its taking for ever thanks rohit
help needed in perforance improvement of spark structured streaming
Hi Community, I have a use case where i need to call stored procedure through structured streaming. I am able to send kafka message and call stored procedure , but since foreach sink keeps on executing stored procedure per message i want to combine all the messages in single dtaframe and then call stored procedure at once is it possible to do current code select('value cast "string",'topic) .select('topic,concat_ws(",", 'value cast "string") as 'value1) .groupBy('topic cast "string").count() .coalesce(1) .as[String] .writeStream .trigger(ProcessingTime("60 seconds")) .option("checkpointLocation", checkpointUrl) .foreach(new SimpleSqlServerSink(jdbcUrl, connectionProperties)) thanks rohit
User class threw exception: java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html
Hi Team, I am working on structured streaming i have added all libraries in build,sbt then also its not picking up right library an failing with error User class threw exception: java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html i am using jenkins to deploy this task thanks amit
how to call stored procedure from spark
Hi Guys I have stored procedure which does transformation and write it to sql server table i am not able to execute this through spark is there any way in which spark streaming sink can just be calling these stored procedure
How to bulk insert using spark streaming job
How to bulk insert using spark streaming job Sent from my iPhone
Re: optimize hive query to move a subset of data from one partition table to another table
Hi create table emp as select * from emp_full where join_date >=date_sub(join_date,2) i am trying to select from one table insert into another table i need a way to do select last 2 month of data everytime table is partitioned on year month day On Sun, Feb 11, 2018 at 4:30 PM, Richard Qiao <richardqiao2...@gmail.com> wrote: > Would you mind share your code with us to analyze? > > > On Feb 10, 2018, at 10:18 AM, amit kumar singh <amitiem...@gmail.com> > wrote: > > > > Hi Team, > > > > We have hive external table which has 50 tb of data partitioned on year > month day > > > > i want to move last 2 month of data into another table > > > > when i try to do this through spark ,more than 120k task are getting > created > > > > what is the best way to do this > > > > thanks > > Rohit > >
optimize hive query to move a subset of data from one partition table to another table
Hi Team, We have hive external table which has 50 tb of data partitioned on year month day i want to move last 2 month of data into another table when i try to do this through spark ,more than 120k task are getting created what is the best way to do this thanks Rohit
How to configure spark with java
Hello everyone I want to use spark with java API Please let me know how can I configure it Thanks A - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: type issue: found RDD[T] expected RDD[A]
Hi Evan, Patrick and Tobias, So, It worked for what I needed it to do. I followed Yana's suggestion of using parameterized type of [T : Product:ClassTag:TypeTag] more concretely, I was trying to make the query process a bit more fluent -some pseudocode but with correct types val table:SparkTable[POJO] = new SparkTable[POJO](sqlContext,extractor:String=POJO) val data= table.atLocation(hdfs://) .withName(tableName) .makeRDD(SELECT * FROM tableName) class SparkTable[T : Product : ClassTag :TypeTag](val sqlContext:SQLContext, val extractor: (String) = (T) ) { private[this] var location:Option[String] =None private[this] var name:Option[String]=None private[this] val sc = sqlContext.sparkContext def withName(name:String):SparkTable[T]={..} def atLocation(path:String):SparkTable[T]={.. } def makeRDD(sqlQuery:String):SchemaRDD={ ... import sqlContext._ val rdd:RDD[String] = sc.textFile(this.location.get) val rddT:RDD[T] = rdd.map(extractor) val schemaRDD= createSchemaRDD(rddT) schemaRDD.registerAsTable(name.get) val all = sqlContext.sql(sqlQuery) all } } Best, Amit On Tue, Aug 19, 2014 at 9:13 PM, Evan Chan velvia.git...@gmail.com wrote: That might not be enough. Reflection is used to determine what the fields are, thus your class might actually need to have members corresponding to the fields in the table. I heard that a more generic method of inputting stuff is coming. On Tue, Aug 19, 2014 at 6:43 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, On Tue, Aug 19, 2014 at 7:01 PM, Patrick McGloin mcgloin.patr...@gmail.com wrote: I think the type of the data contained in your RDD needs to be a known case class and not abstract for createSchemaRDD. This makes sense when you think it needs to know about the fields in the object to create the schema. Exactly this. The actual message pointing to that is: inferred type arguments [T] do not conform to method createSchemaRDD's type parameter bounds [A : Product] All case classes are automatically subclasses of Product, but otherwise you will have to extend Product and add the required methods yourself. Tobias
type issue: found RDD[T] expected RDD[A]
Hi All, I am having some trouble trying to write generic code that uses sqlContext and RDDs. Can you suggest what might be wrong? class SparkTable[T : ClassTag](val sqlContext:SQLContext, val extractor: (String) = (T) ) { private[this] var location:Option[String] =None private[this] var name:Option[String]=None private[this] val sc = sqlContext.sparkContext ... def makeRDD(sqlQuery:String):SchemaRDD={ require(this.location!=None) require(this.name!=None) import sqlContext._ val rdd:RDD[String] = sc.textFile(this.location.get) val rddT:RDD[T] = rdd.map(extractor) val schemaRDD:SchemaRDD= createSchemaRDD(rddT) schemaRDD.registerAsTable(name.get) val all = sqlContext.sql(sqlQuery) all } } I use it as below: def extractor(line:String):POJO={ val splits= line.split(pattern).toList POJO(splits(0),splits(1),splits(2),splits(3)) } val pojoTable:SparkTable[POJO] = new SparkTable[POJO](sqlContext,extractor) val identityData:SchemaRDD= pojoTable.atLocation(hdfs://location/table) .withName(pojo) .makeRDD(SELECT * FROM pojo) I get compilation failure inferred type arguments [T] do not conform to method createSchemaRDD's type parameter bounds [A : Product] [error] val schemaRDD:SchemaRDD= createSchemaRDD(rddT) [error] ^ [error] SparkTable.scala:37: type mismatch; [error] found : org.apache.spark.rdd.RDD[T] [error] required: org.apache.spark.rdd.RDD[A] [error] val schemaRDD:SchemaRDD= createSchemaRDD(rddT) [error] ^ [error] two errors found I am probably missing something basic either in scala reflection/types or implicits? Any hints would be appreciated. Thanks Amit