Re: Why my spark job STATE--> Running FINALSTATE --> Undefined.
Hi Shyam, It will be good if you mention what are you using the --master url as? Is it running on YARN, Mesos or Spark cluster? However, I faced such an issue in my earlier trials with spark, in which I created connections with a lot of external databases like Cassandra within the Driver (or main program of my app). After the job completed, my Main program/driver task never finished, after debugging I found out to be the reason as open sessions with Cassandra. Closing out those connections at the end of my main program helped resolve the problem. As you can guess, this issue was then irrespective of the Cluster manager used. Akshay Bhardwaj +91-97111-33849 On Tue, Jun 11, 2019 at 7:41 PM Shyam P wrote: > Hi, > Any clue why spark job goes into UNDEFINED state ? > > More detail are in the url. > > https://stackoverflow.com/questions/56545644/why-my-spark-sql-job-stays-in-state-runningfinalstatus-undefined > > > Appreciate your help. > > Regards, > Shyam >
Why my spark job STATE--> Running FINALSTATE --> Undefined.
Hi, Any clue why spark job goes into UNDEFINED state ? More detail are in the url. https://stackoverflow.com/questions/56545644/why-my-spark-sql-job-stays-in-state-runningfinalstatus-undefined Appreciate your help. Regards, Shyam
Re: Undefined function json_array_to_map
Hi Ted/All, i did below to get fullstack and see below, not able to understand root cause.. except Exception as error: traceback.print_exc() and this what i get... File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 580, in sql return DataFrame(self._ssql_ctx.sql(sqlQuery), self) File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 51, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) AnalysisException: u'undefined function json_array_to_map; line 28 pos 73' On Wed, Aug 17, 2016 at 8:59 AM, vr spark wrote: > spark 1.6.1 > python > > I0817 08:51:59.099356 15189 detector.cpp:481] A new leading master (UPID= > master@10.224.167.25:5050) is detected > I0817 08:51:59.099735 15188 sched.cpp:262] New master detected at > master@x.y.17.25:4550 > I0817 08:51:59.100888 15188 sched.cpp:272] No credentials provided. > Attempting to register without authentication > I0817 08:51:59.326017 15190 sched.cpp:641] Framework registered with > b859f266-9984-482d-8c0d-35bd88c1ad0a-6996 > 16/08/17 08:52:06 WARN ObjectStore: Version information not found in > metastore. hive.metastore.schema.verification is not enabled so recording > the schema version 1.2.0 > 16/08/17 08:52:06 WARN ObjectStore: Failed to get database default, > returning NoSuchObjectException > Traceback (most recent call last): > File "/data1/home/vttrich/spk/orig_qryhubb.py", line 17, in > res=sqlcont.sql("select parti_date FROM log_data WHERE parti_date >= > 408910 limit 10") > File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/context.py", > line 580, in sql > File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", > line 813, in __call__ > File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", > line 51, in deco > pyspark.sql.utils.AnalysisException: u'undefined function > json_array_to_map; line 28 pos 73' > I0817 08:52:12.840224 15600 sched.cpp:1771] Asked to stop the driver > I0817 08:52:12.841198 15189 sched.cpp:1040] Stopping framework > 'b859f2f3-7484-482d-8c0d-35bd91c1ad0a-6326' > > > On Wed, Aug 17, 2016 at 8:50 AM, Ted Yu wrote: > >> Can you show the complete stack trace ? >> >> Which version of Spark are you using ? >> >> Thanks >> >> On Wed, Aug 17, 2016 at 8:46 AM, vr spark wrote: >> >>> Hi, >>> I am getting error on below scenario. Please suggest. >>> >>> i have a virtual view in hive >>> >>> view name log_data >>> it has 2 columns >>> >>> query_map map >>> >>> parti_date int >>> >>> >>> Here is my snippet for the spark data frame >>> >>> my dataframe >>> >>> res=sqlcont.sql("select parti_date FROM log_data WHERE parti_date >= >>> 408910 limit 10") >>> >>> df=res.collect() >>> >>> print 'after collect' >>> >>> print df >>> >>> >>> * File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", >>> line 51, in deco* >>> >>> *pyspark.sql.utils.AnalysisException: u'undefined function >>> json_array_to_map; line 28 pos 73'* >>> >>> >>> >>> >>> >> >
Re: Undefined function json_array_to_map
Can you show the complete stack trace ? Which version of Spark are you using ? Thanks On Wed, Aug 17, 2016 at 8:46 AM, vr spark wrote: > Hi, > I am getting error on below scenario. Please suggest. > > i have a virtual view in hive > > view name log_data > it has 2 columns > > query_map map > > parti_date int > > > Here is my snippet for the spark data frame > > my dataframe > > res=sqlcont.sql("select parti_date FROM log_data WHERE parti_date >= > 408910 limit 10") > > df=res.collect() > > print 'after collect' > > print df > > > * File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", > line 51, in deco* > > *pyspark.sql.utils.AnalysisException: u'undefined function > json_array_to_map; line 28 pos 73'* > > > > >
Undefined function json_array_to_map
Hi, I am getting error on below scenario. Please suggest. i have a virtual view in hive view name log_data it has 2 columns query_map map parti_date int Here is my snippet for the spark data frame my dataframe res=sqlcont.sql("select parti_date FROM log_data WHERE parti_date >= 408910 limit 10") df=res.collect() print 'after collect' print df * File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 51, in deco* *pyspark.sql.utils.AnalysisException: u'undefined function json_array_to_map; line 28 pos 73'*
Re: org.apache.spark.sql.AnalysisException: undefined function lit;
selectExpr just uses the SQL parser to interpret the string you give it. So to get a string literal you would use quotes: df.selectExpr("*", "'" + time.miliseconds() + "' AS ms") On Fri, Feb 12, 2016 at 6:19 PM, Andy Davidson < a...@santacruzintegration.com> wrote: > I am trying to add a column with a constant value to my data frame. Any > idea what I am doing wrong? > > Kind regards > > Andy > > > DataFrame result = … > > String exprStr = "lit(" + time.milliseconds()+ ") as ms"; > > logger.warn("AEDWIP expr: {}", exprStr); > > result.selectExpr("*", exprStr).show(false); > > WARN 02:06:17 streaming-job-executor-0 c.p.f.s.s.CalculateAggregates$1 > call line:96 AEDWIP expr: lit(1455329175000) as ms > > ERROR 02:06:17 JobScheduler o.a.s.Logging$class logError line:95 Error > running job streaming job 1455329175000 ms.0 > > org.apache.spark.sql.AnalysisException: undefined function lit; > > > >
Re: org.apache.spark.sql.AnalysisException: undefined function lit;
I've never done it that way but you can simply use the withColumn method in data frames to do it. On 13 Feb 2016 2:19 a.m., "Andy Davidson" wrote: > I am trying to add a column with a constant value to my data frame. Any > idea what I am doing wrong? > > Kind regards > > Andy > > > DataFrame result = … > > String exprStr = "lit(" + time.milliseconds()+ ") as ms"; > > logger.warn("AEDWIP expr: {}", exprStr); > > result.selectExpr("*", exprStr).show(false); > > WARN 02:06:17 streaming-job-executor-0 c.p.f.s.s.CalculateAggregates$1 > call line:96 AEDWIP expr: lit(1455329175000) as ms > > ERROR 02:06:17 JobScheduler o.a.s.Logging$class logError line:95 Error > running job streaming job 1455329175000 ms.0 > > org.apache.spark.sql.AnalysisException: undefined function lit; > > > >
org.apache.spark.sql.AnalysisException: undefined function lit;
I am trying to add a column with a constant value to my data frame. Any idea what I am doing wrong? Kind regards Andy DataFrame result = String exprStr = "lit(" + time.milliseconds()+ ") as ms"; logger.warn("AEDWIP expr: {}", exprStr); result.selectExpr("*", exprStr).show(false); WARN 02:06:17 streaming-job-executor-0 c.p.f.s.s.CalculateAggregates$1 call line:96 AEDWIP expr: lit(1455329175000) as ms ERROR 02:06:17 JobScheduler o.a.s.Logging$class logError line:95 Error running job streaming job 1455329175000 ms.0 org.apache.spark.sql.AnalysisException: undefined function lit;
Undefined job output-path error in Spark on hive
Hi, I am getting following exception in Spark while writing to hive partitioned table in parquet format: 16/01/25 03:56:40 ERROR executor.Executor: Exception in task 0.2 in stage 1.0 (TID 3) java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:232) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org$apache$spark$sql$hive$SparkHiveDynamicPartitionWriterContainer$$newWriter$1(hiveWriterContainers.scala:237) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:250) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:250) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:250) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:112) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:104) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:85) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:85) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) Spark version:1.5.0 Please let me know if anybody has idea about this error. Thanks, Akhilesh
Re: HDFS is undefined
Please post the question on vendor's forum. > On Sep 25, 2015, at 7:13 AM, Angel Angel wrote: > > hello, > I am running the spark application. > > I have installed the cloudera manager. > it includes the spark version 1.2.0 > > > But now i want to use spark version 1.4.0. > > its also working fine. > > But when i try to access the HDFS in spark 1.4.0 in eclipse i am getting the > following error. > > "Exception in thread "main" java.nio.file.FileSystemNotFoundException: > Provider "hdfs" not installed " > > > My spark 1.4.0 spark-env.sh file is > > export HADOOP_CONF_DIR=/etc/hadoop/conf > export SPARK_HOME=/root/spark-1.4.0 > > > export > DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hadoop > > still i am getting the error. > > please give me suggestions. > > Thanking You, > Sagar Jadhav. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: HDFS is undefined
For some reason Spark isnt picking up your hadoop confs, Did you download spark compiled with the hadoop version that you are having in the cluster? Thanks Best Regards On Fri, Sep 25, 2015 at 7:43 PM, Angel Angel wrote: > hello, > I am running the spark application. > > I have installed the cloudera manager. > it includes the spark version 1.2.0 > > > But now i want to use spark version 1.4.0. > > its also working fine. > > But when i try to access the HDFS in spark 1.4.0 in eclipse i am getting > the following error. > > "Exception in thread "main" java.nio.file.FileSystemNotFoundException: > Provider "hdfs" not installed " > > > My spark 1.4.0 spark-env.sh file is > > export HADOOP_CONF_DIR=/etc/hadoop/conf > export SPARK_HOME=/root/spark-1.4.0 > > > export > DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hadoop > > still i am getting the error. > > please give me suggestions. > > Thanking You, > Sagar Jadhav. >
HDFS is undefined
hello, I am running the spark application. I have installed the cloudera manager. it includes the spark version 1.2.0 But now i want to use spark version 1.4.0. its also working fine. But when i try to access the HDFS in spark 1.4.0 in eclipse i am getting the following error. "Exception in thread "main" java.nio.file.FileSystemNotFoundException: Provider "hdfs" not installed " My spark 1.4.0 spark-env.sh file is export HADOOP_CONF_DIR=/etc/hadoop/conf export SPARK_HOME=/root/spark-1.4.0 export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hadoop still i am getting the error. please give me suggestions. Thanking You, Sagar Jadhav.
Re: sqlContext is undefined in the Spark Shell
This is a noise,please ignore I figured out what happens... bit1...@163.com From: bit1...@163.com Date: 2015-01-03 19:03 To: user Subject: sqlContext is undefined in the Spark Shell Hi, In the spark shell, I do the following two things: 1. scala> val cxt = new org.apache.spark.sql.SQLContext(sc); 2. scala> import sqlContext._ The 1st one succeeds while the 2nd one fails with the following error, :10: error: not found: value sqlContext import sqlContext._ Is there something missing? I am using Spark 1.2.0. Thanks. bit1...@163.com
sqlContext is undefined in the Spark Shell
Hi, In the spark shell, I do the following two things: 1. scala> val cxt = new org.apache.spark.sql.SQLContext(sc); 2. scala> import sqlContext._ The 1st one succeeds while the 2nd one fails with the following error, :10: error: not found: value sqlContext import sqlContext._ Is there something missing? I am using Spark 1.2.0. Thanks. bit1...@163.com
undefined
Hi guys. I run the folling command to lauch a new cluster : ./spark-ec2 -k test -i test.pem -s 1 --vpc-id vpc-X --subnet-id subnet-X launch vpc_spark The instances started ok but the command never end. With the following output: Setting up security groups... Searching for existing cluster vpc_spark... Spark AMI: ami-5bb18832 Launching instances... Launched 1 slaves in us-east-1a, regid = r-e9d603c4 Launched master in us-east-1a, regid = r-89d104a4 Waiting for cluster to enter 'ssh-ready' state... any ideas what happend?
Cannot summit Spark app to cluster, stuck on “UNDEFINED”
I use this command to summit *spark application* to *yarn cluster* export YARN_CONF_DIR=conf bin/spark-submit --class "Mining" --master yarn-cluster --executor-memory 512m ./target/scala-2.10/mining-assembly-0.1.jar *In Web UI, it stuck on* UNDEFINED [image: enter image description here] *In console, it stuck to* 14/11/12 16:37:55 INFO yarn.Client: Application report from ASM: application identifier: application_1415704754709_0017 appId: 17 clientToAMToken: null appDiagnostics: appMasterHost: example.com appQueue: default appMasterRpcPort: 0 appStartTime: 1415784586000 yarnAppState: RUNNING distributedFinalState: UNDEFINED appTrackingUrl: http://example.com:8088/proxy/application_1415704754709_0017/ appUser: rain Update: Dive into Logs for container in Web UI http://example.com:8042/node/containerlogs/container_1415704754709_0017_01_01/rain/stderr/?start=0, I found this 14/11/12 02:11:47 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/11/12 02:11:47 DEBUG Client: IPC Client (1211012646) connection tospark.mvs.vn/192.168.64.142:8030 from rain sending #24418 14/11/12 02:11:47 DEBUG Client: IPC Client (1211012646) connection tospark.mvs.vn/192.168.64.142:8030 from rain got value #24418 I found this problem have had solution here http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/ The Hadoop cluster must have sufficient memory for the request. For example, submitting the following job with 1GB memory allocated for executor and Spark driver fails with the above error in the HDP 2.1 Sandbox. Reduce the memory asked for the executor and the Spark driver to 512m and re-start the cluster. I'm trying this solution and hopefully it will work