Re: Why my spark job STATE--> Running FINALSTATE --> Undefined.

2019-06-11 Thread Akshay Bhardwaj
Hi Shyam,

It will be good if you mention what are you using the --master url as? Is
it running on YARN, Mesos or Spark cluster?

However, I faced such an issue in my earlier trials with spark, in which I
created connections with a lot of external databases like Cassandra within
the Driver (or main program of my app).
After the job completed, my Main program/driver task never finished, after
debugging I found out to be the reason as open sessions with Cassandra.
Closing out those connections at the end of my main program helped resolve
the problem. As you can guess, this issue was then irrespective of the
Cluster manager used.


Akshay Bhardwaj
+91-97111-33849


On Tue, Jun 11, 2019 at 7:41 PM Shyam P  wrote:

> Hi,
> Any clue why spark job goes into UNDEFINED state ?
>
> More detail are in the url.
>
> https://stackoverflow.com/questions/56545644/why-my-spark-sql-job-stays-in-state-runningfinalstatus-undefined
>
>
> Appreciate your help.
>
> Regards,
> Shyam
>


Why my spark job STATE--> Running FINALSTATE --> Undefined.

2019-06-11 Thread Shyam P
Hi,
Any clue why spark job goes into UNDEFINED state ?

More detail are in the url.
https://stackoverflow.com/questions/56545644/why-my-spark-sql-job-stays-in-state-runningfinalstatus-undefined


Appreciate your help.

Regards,
Shyam


Re: Undefined function json_array_to_map

2016-08-17 Thread vr spark
Hi Ted/All,
i did below to get fullstack and see below, not able to understand root
cause..

except Exception as error:

traceback.print_exc()

and this what i get...


 File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/context.py",
line 580, in sql

return DataFrame(self._ssql_ctx.sql(sqlQuery), self)

  File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 813, in __call__

answer, self.gateway_client, self.target_id, self.name)

  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line
51, in deco

raise AnalysisException(s.split(': ', 1)[1], stackTrace)

AnalysisException: u'undefined function json_array_to_map; line 28 pos 73'

On Wed, Aug 17, 2016 at 8:59 AM, vr spark  wrote:

> spark 1.6.1
> python
>
> I0817 08:51:59.099356 15189 detector.cpp:481] A new leading master (UPID=
> master@10.224.167.25:5050) is detected
> I0817 08:51:59.099735 15188 sched.cpp:262] New master detected at
> master@x.y.17.25:4550
> I0817 08:51:59.100888 15188 sched.cpp:272] No credentials provided.
> Attempting to register without authentication
> I0817 08:51:59.326017 15190 sched.cpp:641] Framework registered with
> b859f266-9984-482d-8c0d-35bd88c1ad0a-6996
> 16/08/17 08:52:06 WARN ObjectStore: Version information not found in
> metastore. hive.metastore.schema.verification is not enabled so recording
> the schema version 1.2.0
> 16/08/17 08:52:06 WARN ObjectStore: Failed to get database default,
> returning NoSuchObjectException
> Traceback (most recent call last):
>   File "/data1/home/vttrich/spk/orig_qryhubb.py", line 17, in 
> res=sqlcont.sql("select parti_date FROM log_data WHERE parti_date  >=
> 408910 limit 10")
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/context.py",
> line 580, in sql
>   File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
> line 813, in __call__
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",
> line 51, in deco
> pyspark.sql.utils.AnalysisException: u'undefined function
> json_array_to_map; line 28 pos 73'
> I0817 08:52:12.840224 15600 sched.cpp:1771] Asked to stop the driver
> I0817 08:52:12.841198 15189 sched.cpp:1040] Stopping framework
> 'b859f2f3-7484-482d-8c0d-35bd91c1ad0a-6326'
>
>
> On Wed, Aug 17, 2016 at 8:50 AM, Ted Yu  wrote:
>
>> Can you show the complete stack trace ?
>>
>> Which version of Spark are you using ?
>>
>> Thanks
>>
>> On Wed, Aug 17, 2016 at 8:46 AM, vr spark  wrote:
>>
>>> Hi,
>>> I am getting error on below scenario. Please suggest.
>>>
>>> i have  a virtual view in hive
>>>
>>> view name log_data
>>> it has 2 columns
>>>
>>> query_map   map
>>>
>>> parti_date int
>>>
>>>
>>> Here is my snippet for the spark data frame
>>>
>>> my dataframe
>>>
>>> res=sqlcont.sql("select parti_date FROM log_data WHERE parti_date  >=
>>> 408910 limit 10")
>>>
>>> df=res.collect()
>>>
>>> print 'after collect'
>>>
>>> print df
>>>
>>>
>>> * File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",
>>> line 51, in deco*
>>>
>>> *pyspark.sql.utils.AnalysisException: u'undefined function
>>> json_array_to_map; line 28 pos 73'*
>>>
>>>
>>>
>>>
>>>
>>
>


Re: Undefined function json_array_to_map

2016-08-17 Thread Ted Yu
Can you show the complete stack trace ?

Which version of Spark are you using ?

Thanks

On Wed, Aug 17, 2016 at 8:46 AM, vr spark  wrote:

> Hi,
> I am getting error on below scenario. Please suggest.
>
> i have  a virtual view in hive
>
> view name log_data
> it has 2 columns
>
> query_map   map
>
> parti_date int
>
>
> Here is my snippet for the spark data frame
>
> my dataframe
>
> res=sqlcont.sql("select parti_date FROM log_data WHERE parti_date  >=
> 408910 limit 10")
>
> df=res.collect()
>
> print 'after collect'
>
> print df
>
>
> * File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",
> line 51, in deco*
>
> *pyspark.sql.utils.AnalysisException: u'undefined function
> json_array_to_map; line 28 pos 73'*
>
>
>
>
>


Undefined function json_array_to_map

2016-08-17 Thread vr spark
Hi,
I am getting error on below scenario. Please suggest.

i have  a virtual view in hive

view name log_data
it has 2 columns

query_map   map

parti_date int


Here is my snippet for the spark data frame

my dataframe

res=sqlcont.sql("select parti_date FROM log_data WHERE parti_date  >=
408910 limit 10")

df=res.collect()

print 'after collect'

print df


* File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line
51, in deco*

*pyspark.sql.utils.AnalysisException: u'undefined function
json_array_to_map; line 28 pos 73'*


Re: org.apache.spark.sql.AnalysisException: undefined function lit;

2016-02-13 Thread Michael Armbrust
selectExpr just uses the SQL parser to interpret the string you give it.
So to get a string literal you would use quotes:

df.selectExpr("*", "'" + time.miliseconds() + "' AS ms")

On Fri, Feb 12, 2016 at 6:19 PM, Andy Davidson <
a...@santacruzintegration.com> wrote:

> I am trying to add a column with a constant value to my data frame. Any
> idea what I am doing wrong?
>
> Kind regards
>
> Andy
>
>
>  DataFrame result = …
>
>  String exprStr = "lit(" + time.milliseconds()+ ") as ms";
>
>  logger.warn("AEDWIP expr: {}", exprStr);
>
>   result.selectExpr("*", exprStr).show(false);
>
> WARN  02:06:17 streaming-job-executor-0 c.p.f.s.s.CalculateAggregates$1
> call line:96 AEDWIP expr: lit(1455329175000) as ms
>
> ERROR 02:06:17 JobScheduler o.a.s.Logging$class logError line:95 Error
> running job streaming job 1455329175000 ms.0
>
> org.apache.spark.sql.AnalysisException: undefined function lit;
>
>
>
>


Re: org.apache.spark.sql.AnalysisException: undefined function lit;

2016-02-13 Thread Sebastian Piu
I've never done it that way but you can simply use the withColumn method in
data frames to do it.
On 13 Feb 2016 2:19 a.m., "Andy Davidson" 
wrote:

> I am trying to add a column with a constant value to my data frame. Any
> idea what I am doing wrong?
>
> Kind regards
>
> Andy
>
>
>  DataFrame result = …
>
>  String exprStr = "lit(" + time.milliseconds()+ ") as ms";
>
>  logger.warn("AEDWIP expr: {}", exprStr);
>
>   result.selectExpr("*", exprStr).show(false);
>
> WARN  02:06:17 streaming-job-executor-0 c.p.f.s.s.CalculateAggregates$1
> call line:96 AEDWIP expr: lit(1455329175000) as ms
>
> ERROR 02:06:17 JobScheduler o.a.s.Logging$class logError line:95 Error
> running job streaming job 1455329175000 ms.0
>
> org.apache.spark.sql.AnalysisException: undefined function lit;
>
>
>
>


org.apache.spark.sql.AnalysisException: undefined function lit;

2016-02-12 Thread Andy Davidson
I am trying to add a column with a constant value to my data frame. Any idea
what I am doing wrong?

Kind regards

Andy


 DataFrame result = Š
 String exprStr = "lit(" + time.milliseconds()+ ") as ms";

 logger.warn("AEDWIP expr: {}", exprStr);

  result.selectExpr("*", exprStr).show(false);


WARN  02:06:17 streaming-job-executor-0 c.p.f.s.s.CalculateAggregates$1 call
line:96 AEDWIP expr: lit(1455329175000) as ms

ERROR 02:06:17 JobScheduler o.a.s.Logging$class logError line:95 Error
running job streaming job 1455329175000 ms.0

org.apache.spark.sql.AnalysisException: undefined function lit;








Undefined job output-path error in Spark on hive

2016-01-25 Thread Akhilesh Pathodia
Hi,

I am getting following exception in Spark while writing to hive partitioned
table in parquet format:

16/01/25 03:56:40 ERROR executor.Executor: Exception in task 0.2 in
stage 1.0 (TID 3)
java.io.IOException: Undefined job output-path
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:232)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org$apache$spark$sql$hive$SparkHiveDynamicPartitionWriterContainer$$newWriter$1(hiveWriterContainers.scala:237)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:250)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:250)
at 
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:250)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:112)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:104)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:85)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:85)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

Spark version:1.5.0

Please let me know if anybody has idea about this error.

Thanks,

Akhilesh


Re: HDFS is undefined

2015-09-28 Thread Ted Yu
Please post the question on vendor's forum. 

> On Sep 25, 2015, at 7:13 AM, Angel Angel  wrote:
> 
> hello,
> I am running the spark application.
> 
> I have installed the cloudera manager.
> it includes the spark version 1.2.0
> 
> 
> But now i want to use spark version 1.4.0.
> 
> its also working fine.
> 
> But when i try to access the HDFS in spark 1.4.0 in eclipse i am getting the 
> following error.
> 
> "Exception in thread "main" java.nio.file.FileSystemNotFoundException: 
> Provider "hdfs" not installed "
> 
> 
> My spark 1.4.0 spark-env.sh file is  
> 
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> export SPARK_HOME=/root/spark-1.4.0
> 
> 
> export 
> DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hadoop
> 
> still i am getting the error.
> 
> please give me suggestions.
> 
> Thanking You,
> Sagar Jadhav. 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: HDFS is undefined

2015-09-28 Thread Akhil Das
For some reason Spark isnt picking up your hadoop confs, Did you download
spark compiled with the hadoop version that you are having in the cluster?

Thanks
Best Regards

On Fri, Sep 25, 2015 at 7:43 PM, Angel Angel 
wrote:

> hello,
> I am running the spark application.
>
> I have installed the cloudera manager.
> it includes the spark version 1.2.0
>
>
> But now i want to use spark version 1.4.0.
>
> its also working fine.
>
> But when i try to access the HDFS in spark 1.4.0 in eclipse i am getting
> the following error.
>
> "Exception in thread "main" java.nio.file.FileSystemNotFoundException:
> Provider "hdfs" not installed "
>
>
> My spark 1.4.0 spark-env.sh file is
>
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> export SPARK_HOME=/root/spark-1.4.0
>
>
> export
> DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hadoop
>
> still i am getting the error.
>
> please give me suggestions.
>
> Thanking You,
> Sagar Jadhav.
>


HDFS is undefined

2015-09-25 Thread Angel Angel
hello,
I am running the spark application.

I have installed the cloudera manager.
it includes the spark version 1.2.0


But now i want to use spark version 1.4.0.

its also working fine.

But when i try to access the HDFS in spark 1.4.0 in eclipse i am getting
the following error.

"Exception in thread "main" java.nio.file.FileSystemNotFoundException:
Provider "hdfs" not installed "


My spark 1.4.0 spark-env.sh file is

export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_HOME=/root/spark-1.4.0


export
DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hadoop

still i am getting the error.

please give me suggestions.

Thanking You,
Sagar Jadhav.


Re: sqlContext is undefined in the Spark Shell

2015-01-03 Thread bit1...@163.com
This is a noise,please ignore

I figured out what happens...



bit1...@163.com
 
From: bit1...@163.com
Date: 2015-01-03 19:03
To: user
Subject: sqlContext is undefined in the Spark Shell
Hi,

In the spark shell, I do the following two things:

1. scala> val cxt = new org.apache.spark.sql.SQLContext(sc);
2. scala> import sqlContext._

The 1st one succeeds while the 2nd one fails with the following error,

:10: error: not found: value sqlContext 
import sqlContext._

Is there something missing? I am using Spark 1.2.0.

Thanks.




bit1...@163.com


sqlContext is undefined in the Spark Shell

2015-01-03 Thread bit1...@163.com
Hi,

In the spark shell, I do the following two things:

1. scala> val cxt = new org.apache.spark.sql.SQLContext(sc);
2. scala> import sqlContext._

The 1st one succeeds while the 2nd one fails with the following error,

:10: error: not found: value sqlContext 
import sqlContext._

Is there something missing? I am using Spark 1.2.0.

Thanks.




bit1...@163.com


undefined

2014-12-18 Thread Eduardo Cusa
Hi guys.

I run the folling command to lauch a new cluster :

./spark-ec2 -k test -i test.pem -s 1  --vpc-id vpc-X --subnet-id
subnet-X launch  vpc_spark

The instances started ok but the command never end. With the following
output:


Setting up security groups...
Searching for existing cluster vpc_spark...
Spark AMI: ami-5bb18832
Launching instances...
Launched 1 slaves in us-east-1a, regid = r-e9d603c4
Launched master in us-east-1a, regid = r-89d104a4
Waiting for cluster to enter 'ssh-ready' state...


any ideas what happend?


Cannot summit Spark app to cluster, stuck on “UNDEFINED”

2014-11-12 Thread brother rain
I use this command to summit *spark application* to *yarn cluster*

export YARN_CONF_DIR=conf
bin/spark-submit --class "Mining"
  --master yarn-cluster
  --executor-memory 512m ./target/scala-2.10/mining-assembly-0.1.jar

*In Web UI, it stuck on* UNDEFINED

[image: enter image description here]

*In console, it stuck to*

14/11/12 16:37:55 INFO yarn.Client: Application report from ASM:
 application identifier: application_1415704754709_0017
 appId: 17
 clientToAMToken: null
 appDiagnostics:
 appMasterHost: example.com
 appQueue: default
 appMasterRpcPort: 0
 appStartTime: 1415784586000
 yarnAppState: RUNNING
 distributedFinalState: UNDEFINED
 appTrackingUrl:
http://example.com:8088/proxy/application_1415704754709_0017/
 appUser: rain


Update:

Dive into Logs for container in Web UI
http://example.com:8042/node/containerlogs/container_1415704754709_0017_01_01/rain/stderr/?start=0,
I found this

14/11/12 02:11:47 WARN YarnClusterScheduler: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered
and have sufficient memory
14/11/12 02:11:47 DEBUG Client: IPC Client (1211012646) connection
tospark.mvs.vn/192.168.64.142:8030 from rain sending #24418
14/11/12 02:11:47 DEBUG Client: IPC Client (1211012646) connection
tospark.mvs.vn/192.168.64.142:8030 from rain got value #24418

I found this problem have had solution here
http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/

The Hadoop cluster must have sufficient memory for the request.

For example, submitting the following job with 1GB memory allocated for
executor and Spark driver fails with the above error in the HDP 2.1 Sandbox.
Reduce the memory asked for the executor and the Spark driver to 512m and
re-start the cluster.

I'm trying this solution and hopefully it will work