Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-18 Thread Mich Talebzadeh
Yes, it sounds like it. So the broadcast DF size seems to be between 1 and 4GB. So I suggest that you leave it as it is. I have not used the standalone mode since spark-2.4.3 so I may be missing a fair bit of context here. I am sure there are others like you that are still using it! HTH Mich

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
No, the driver memory was not set explicitly. So it was likely the default value, which appears to be 1GB. On Thu, Aug 17, 2023, 16:49 Mich Talebzadeh wrote: > One question, what was the driver memory before setting it to 4G? Did you > have it set at all before? > > HTH > > Mich Talebzadeh, >

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
One question, what was the driver memory before setting it to 4G? Did you have it set at all before? HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Mich, Here are my config values from spark-defaults.conf: spark.eventLog.enabled true spark.eventLog.dir hdfs://10.0.50.1:8020/spark-logs spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider spark.history.fs.logDirectory hdfs://10.0.50.1:8020/spark-logs

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
Hello Paatrick, As a matter of interest what parameters and their respective values do you use in spark-submit. I assume it is running in YARN mode. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Mich, Yes, that's the sequence of events. I think the big breakthrough is that (for now at least) Spark is throwing errors instead of the queries hanging. Which is a big step forward. I can at least troubleshoot issues if I know what they are. When I reflect on the issues I faced and the

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Mich Talebzadeh
Hi Patrik, glad that you have managed to sort this problem out. Hopefully it will go away for good. Still we are in the dark about how this problem is going away and coming back :( As I recall the chronology of events were as follows: 1. The Issue with hanging Spark job reported 2.

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-17 Thread Patrick Tucci
Hi Everyone, I just wanted to follow up on this issue. This issue has continued since our last correspondence. Today I had a query hang and couldn't resolve the issue. I decided to upgrade my Spark install from 3.4.0 to 3.4.1. After doing so, instead of the query hanging, I got an error message

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-13 Thread Mich Talebzadeh
OK I use Hive 3.1.1 My suggestion is to put your hive issues to u...@hive.apache.org and for JAVA version compatibility They will give you better info. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-13 Thread Patrick Tucci
I attempted to install Hive yesterday. The experience was similar to other attempts at installing Hive: it took a few hours and at the end of the process, I didn't have a working setup. The latest stable release would not run. I never discovered the cause, but similar StackOverflow questions

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Mich Talebzadeh
OK you would not have known unless you went through the process so to speak. Let us do something revolutionary here  Install hive and its metastore. You already have hadoop anyway https://cwiki.apache.org/confluence/display/hive/adminmanual+installation hive metastore

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Patrick Tucci
Yes, on premise. Unfortunately after installing Delta Lake and re-writing all tables as Delta tables, the issue persists. On Sat, Aug 12, 2023 at 11:34 AM Mich Talebzadeh wrote: > ok sure. > > Is this Delta Lake going to be on-premise? > > Mich Talebzadeh, > Solutions Architect/Engineering

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Mich Talebzadeh
ok sure. Is this Delta Lake going to be on-premise? Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-12 Thread Patrick Tucci
Hi Mich, Thanks for the feedback. My original intention after reading your response was to stick to Hive for managing tables. Unfortunately, I'm running into another case of SQL scripts hanging. Since all tables are already Parquet, I'm out of troubleshooting options. I'm going to migrate to

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-11 Thread Mich Talebzadeh
Hi Patrick, There is not anything wrong with Hive On-premise it is the best data warehouse there is Hive handles both ORC and Parquet formal well. They are both columnar implementations of relational model. What you are seeing is the Spark API to Hive which prefers Parquet. I found out a few

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-11 Thread Patrick Tucci
Thanks for the reply Stephen and Mich. Stephen, you're right, it feels like Spark is waiting for something, but I'm not sure what. I'm the only user on the cluster and there are plenty of resources (+60 cores, +250GB RAM). I even tried restarting Hadoop, Spark and the host servers to make sure

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-11 Thread Mich Talebzadeh
Steve may have a valid point. You raised an issue with concurrent writes before, if I recall correctly. Since this limitation may be due to Hive metastore. By default Spark uses Apache Derby for its database persistence. *However it is limited to only one Spark session at any time for the purposes

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Stephen Coy
Hi Patrick, When this has happened to me in the past (admittedly via spark-submit) it has been because another job was still running and had already claimed some of the resources (cores and memory). I think this can also happen if your configuration tries to claim resources that will never be

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Patrick Tucci
Hi Mich, I don't believe Hive is installed. I set up this cluster from scratch. I installed Hadoop and Spark by downloading them from their project websites. If Hive isn't bundled with Hadoop or Spark, I don't believe I have it. I'm running the Thrift server distributed with Spark, like so:

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
sorry host is 10.0.50.1 Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
Hi Patrick That beeline on port 1 is a hive thrift server running on your hive on host 10.0.50.1:1. if you can access that host, you should be able to log into hive by typing hive. The os user is hadoop in your case and sounds like there is no password! Once inside that host, hive logs

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Patrick Tucci
Hi Mich, Thanks for the reply. Unfortunately I don't have Hive set up on my cluster. I can explore this if there are no other ways to troubleshoot. I'm using beeline to run commands against the Thrift server. Here's the command I use: ~/spark/bin/beeline -u jdbc:hive2://10.0.50.1:1 -n

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Mich Talebzadeh
Can you run this sql query through hive itself? Are you using this command or similar for your thrift server? beeline -u jdbc:hive2:///1/default org.apache.hive.jdbc.HiveDriver -n hadoop -p xxx HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my

Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-10 Thread Patrick Tucci
Hello, I'm attempting to run a query on Spark 3.4.0 through the Spark ThriftServer. The cluster has 64 cores, 250GB RAM, and operates in standalone mode using HDFS for storage. The query is as follows: SELECT ME.*, MB.BenefitID FROM MemberEnrollment ME JOIN MemberBenefits MB ON ME.ID =

Spark SQL Query filter behavior with special characters

2022-07-25 Thread prashanth reddy
Hi Spark Community, Can you please help on the below query posted on Stackoverflow. https://stackoverflow.com/questions/73086256/spark-sql-query-filter-behavior-with-special-characters I am using the below spark sql query, however it doesn't return any records unless I escape the "$"

Re: Spark SQL query

2021-02-03 Thread Mich Talebzadeh
nt is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Wed, 3 Feb 2021 at 11:17, Arpan Bhandari wrote: > Yes Mich, > > Mapping the spark sql query that got executed corresponding to an > applicat

Re: Spark SQL query

2021-02-03 Thread Arpan Bhandari
Yes Mich, Mapping the spark sql query that got executed corresponding to an application Id on yarn would greatly help in analyzing and debugging the query for any potential problems. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com

Re: Spark SQL query

2021-02-03 Thread Mich Talebzadeh
I gather what you are after is a code sniffer for Spark that provides a form of GUI to get the code that applications run against spark. I don't think Spark has this type of plug-in although it would be potentially useful. Some RDBMS provide this. Usually stored on some form of persistent storage

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Mich, The directory is already there and event logs are getting generated, I have checked them it contains the query plan but not the actual query. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL query

2021-02-02 Thread Mich Talebzadeh
create a directory in hdfs hdfs dfs -mkdir /spark_event_logs modify file $SPARK_HOME/conf/spark-defaults.conf and add these two lines spark.eventLog.enabled=true # do not use quotes below spark.eventLog.dir=hdfs://rhes75:9000/spark_event_logs Then run a job and check it hdfs dfs -ls

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Yes i can see the jobs on 8088 and also on the spark history url. spark history server is showing up the plan details on the sql tab but not giving the query. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Hi Mich, I do see the .scala_history directory, but it contains all the queries which got executed uptill now, but if i have to map a specific query to an application Id in yarn that would not correlate, hence this method alone won't suffice Thanks, Arpan Bhandari -- Sent from:

Re: Spark SQL query

2021-02-02 Thread Mich Talebzadeh
Hi Arpan. I believe all applications including spark and scala create a hidden history file You can go to home directory cd # see list of all hidden files ls -a | egrep '^\.' If you are using scala do you see .scala_history file? .scala_history HTH LinkedIn *

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Hi Mich, Repeated the steps as suggested, but still there is no such folder created in the home directory. Do we need to enable some property so that it creates one. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL query

2021-02-02 Thread Arpan Bhandari
Sanchit, It seems I have to do some sort of analysis from the plan to get the query. Appreciate all your help on this. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To

Re: Spark SQL query

2021-02-01 Thread Mich Talebzadeh
Hi Arpan, log in as any user that has execution right for spark. type spark-shell, do some simple commands then exit. go to home directory of that user and look for that hidden file ${HOME/.spark_history it will be there. HTH, LinkedIn *

Re: Spark SQL query

2021-02-01 Thread Sachit Murarka
Application wise it wont show as such. You can try to corelate it with explain plain output using some filters or attribute. Or else if you do not have too much queries in history. Just take queries and find plan of those queries and match it with shown in UI. I know thats the tedious task. But

Re: Spark SQL query

2021-02-01 Thread Arpan Bhandari
Sachit, That is showing all the queries that got executed, but how it would get mapped to specific application Id it was associated with ? Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL query

2021-02-01 Thread Sachit Murarka
Hi arpan, In spark shell when you type :history. then also it is not showing? Thanks Sachit On Mon, 1 Feb 2021, 21:13 Arpan Bhandari, wrote: > Hey Sachit, > > It shows the query plan, which is difficult to diagnose out and depict the > actual query. > > > Thanks, > Arpan Bhandari > > > > -- >

Re: Spark SQL query

2021-02-01 Thread Arpan Bhandari
Hey Mich, Thanks for the suggestions, but i don't see any such folder created on the edge node. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark SQL query

2021-02-01 Thread Arpan Bhandari
Hey Sachit, It shows the query plan, which is difficult to diagnose out and depict the actual query. Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark SQL query

2021-01-31 Thread Mich Talebzadeh
Hi Arpan, I presume you are interested in what client was doing. If you have access to the edge node (where spark code is submitted), look for the following file ${HOME/.spark_history example -rw-r--r--. 1 hduser hadoop 111997 Jun 2 2018 .spark_history just use shell tools (cat, grep etc)

Re: Spark SQL query

2021-01-31 Thread Sachit Murarka
Hi Arpan, Launch spark shell and in the shell type ":history" , you will see the query executed. In the Spark UI under SQL Tab you can see the query plan when you click on the details button(Though it won't show you the complete query). But by looking at the plan you can get your query. Hope

Re: Spark SQL query

2021-01-29 Thread Arpan Bhandari
Hi Sachit, Yes it was executed using spark shell, history is already enabled. already checked sql tab but it is not showing the query. My spark version is 2.4.5 Thanks, Arpan Bhandari -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark SQL query

2021-01-29 Thread Sachit Murarka
Hi Arpan, Was it executed using spark shell? If yes type :history Do u have history server enabled? If yes , go to the history and go to the SQL tab in History UI. Thanks Sachit On Fri, 29 Jan 2021, 19:19 Arpan Bhandari, wrote: > Hi , > > Is there a way to track back spark sql after it has

Spark SQL query

2021-01-29 Thread Arpan Bhandari
Hi , Is there a way to track back spark sql after it has been already run i.e. query has been already submitted by a person and i have to back trace what query actually got submitted. Appreciate any help on this. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

unix_timestamp() equivalent in plain Spark SQL Query

2020-04-02 Thread Aakash Basu
Hi, What is the unix_timestamp() function equivalent in a plain spark SQL query? I want to subtract one timestamp column from another, but in plain SQL am getting error "Should be numeric or calendarinterval and not timestamp." But when I did through the above function inaide

Re: Is there a way to validate the syntax of raw spark sql query?

2019-03-05 Thread kant kodali
t 10:23 PM kant kodali wrote: > >> Hi All, >> >> Is there a way to validate the syntax of raw spark SQL query? >> >> for example, I would like to know if there is any isValid API call spark >> provides? >> >> val query = "select * from table&

Re: Is there a way to validate the syntax of raw spark sql query?

2019-03-05 Thread Akshay Bhardwaj
> Hi All, > > Is there a way to validate the syntax of raw spark SQL query? > > for example, I would like to know if there is any isValid API call spark > provides? > > val query = "select * from table"if(isValid(query)) { > sparkSession.sql(query) } else {

Is there a way to validate the syntax of raw spark sql query?

2019-03-01 Thread kant kodali
Hi All, Is there a way to validate the syntax of raw spark SQL query? for example, I would like to know if there is any isValid API call spark provides? val query = "select * from table"if(isValid(query)) { sparkSession.sql(query) } else { log.error("Invalid Syn

Silly Spark SQL query

2019-01-28 Thread Aakash Basu
Hi, How to do this when the column (malignant and prediction) names are stored in two respective variables? tp = test_transformed[(test_transformed.malignant == 1) & (test_transformed.prediction == 1)].count() Thanks, Aakash.

Re: Silly Spark SQL query

2019-01-28 Thread Aakash Basu
Well, it is done. Using: ma = "malignant" pre = "prediction" tp_test = test_transformed.filter((col(ma) == "1") & (col(pre) == "1")).count() On Mon, Jan 28, 2019 at 5:41 PM Aakash Basu wrote: > Hi, > > How to do this when the column (malignant and prediction) names are stored > in two

??????Java: pass parameters in spark sql query

2018-11-30 Thread 965
; : 2018??11??29??(??) 7:55 ??: "user"; ????: Java: pass parameters in spark sql query Hello there, I am trying to pass parameters in spark.sql query in Java code, the same as in this link https://forums.databricks.com/questions/115/how-do-i-pass-parame

Re: Java: pass parameters in spark sql query

2018-11-28 Thread Ramandeep Singh
That's string interpolation. You could create your own for example :bind and then do replaceall, to replace named parameter. On Wed, Nov 28, 2018, 18:55 Mann Du Hello there, > > I am trying to pass parameters in spark.sql query in Java code, the same > as in this link > >

Java: pass parameters in spark sql query

2018-11-28 Thread Mann Du
Hello there, I am trying to pass parameters in spark.sql query in Java code, the same as in this link https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html The link suggested to use 's' before 'select' as - val param = 100 spark.sql(s""" select * from

is there a way to parse and modify raw spark sql query?

2018-06-05 Thread kant kodali
Hi All, is there a way to parse and modify raw spark sql query? For example, given the following query spark.sql("select hello from view") I want to modify the query or logical plan such that I can get the result equivalent to the below query. spark.sql("select foo, hello f

RE: Spark-SQL Query Optimization: overlapping ranges

2017-05-01 Thread Lavelle, Shawn
Jacek, Thanks for your help. I didn’t want to write a bug/enhancement unless warranted. ~ Shawn From: Jacek Laskowski [mailto:ja...@japila.pl] Sent: Thursday, April 27, 2017 8:39 AM To: Lavelle, Shawn <shawn.lave...@osii.com> Cc: user <user@spark.apache.org> Subject: Re: Spa

Re: Spark-SQL Query Optimization: overlapping ranges

2017-04-27 Thread Jacek Laskowski
e probably going to write our own > org.apache.spark.sql.catalyst.rules.Rule to handle it. > > ~ Shawn > > > > *From:* Jacek Laskowski [mailto:ja...@japila.pl] > *Sent:* Wednesday, April 26, 2017 2:55 AM > *To:* Lavelle, Shawn <shawn.lave...@osii.com> > *Cc:* user

RE: Spark-SQL Query Optimization: overlapping ranges

2017-04-27 Thread Lavelle, Shawn
of thing. We’re probably going to write our own org.apache.spark.sql.catalyst.rules.Rule to handle it. ~ Shawn From: Jacek Laskowski [mailto:ja...@japila.pl] Sent: Wednesday, April 26, 2017 2:55 AM To: Lavelle, Shawn <shawn.lave...@osii.com> Cc: user <user@spark.apache.org> Subject: Re: Spa

Re: Spark-SQL Query Optimization: overlapping ranges

2017-04-26 Thread Jacek Laskowski
explain it and you'll know what happens under the covers. i.e. Use explain on the Dataset. Jacek On 25 Apr 2017 12:46 a.m., "Lavelle, Shawn" wrote: > Hello Spark Users! > >Does the Spark Optimization engine reduce overlapping column ranges? > If so, should it push

Spark-SQL Query Optimization: overlapping ranges

2017-04-24 Thread Lavelle, Shawn
Hello Spark Users! Does the Spark Optimization engine reduce overlapping column ranges? If so, should it push this down to a Data Source? Example, This: Select * from table where col between 3 and 7 OR col between 5 and 9 Reduces to: Select * from table where col between 3 and 9

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
apache.org> > *Sent:* Tuesday, January 17, 2017 3:00 AM > *To:* user @spark > *Subject:* Re: Spark sql query plan contains all the partitions from hive > table even though filtering of partitions is provided > > Had a high level look into the code. Seems getHiveQlP

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Yong Zhang
From: Raju Bairishetti <r...@apache.org> Sent: Tuesday, January 17, 2017 3:00 AM To: user @spark Subject: Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided Had a high level look into the code.

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-17 Thread Raju Bairishetti
Had a high level look into the code. Seems getHiveQlPartitions method from HiveMetastoreCatalog is getting called irrespective of metastorePartitionPruning conf value. It should not fetch all partitions if we set metastorePartitionPruning to true (Default value for this is false) def

Re: Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-15 Thread Raju Bairishetti
Waiting for suggestions/help on this... On Wed, Jan 11, 2017 at 12:14 PM, Raju Bairishetti wrote: > Hello, > >Spark sql is generating query plan with all partitions information even > though if we apply filters on partitions in the query. Due to this, spark > driver/hive

Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-10 Thread Raju Bairishetti
Hello, Spark sql is generating query plan with all partitions information even though if we apply filters on partitions in the query. Due to this, spark driver/hive metastore is hitting with OOM as each table is with lots of partitions. We can confirm from hive audit logs that it tries to

Re: time to run Spark SQL query

2016-11-28 Thread ayan guha
They should take same time if everything else is constant On 28 Nov 2016 23:41, "Hitesh Goyal" wrote: > Hi team, I am using spark SQL for accessing the amazon S3 bucket data. > > If I run a sql query by using normal SQL syntax like below > > 1) DataFrame

time to run Spark SQL query

2016-11-28 Thread Hitesh Goyal
Hi team, I am using spark SQL for accessing the amazon S3 bucket data. If I run a sql query by using normal SQL syntax like below 1) DataFrame d=sqlContext.sql(i.e. Select * from tablename where column_condition); Secondly, if I use dataframe functions for the same query like below :- 2)

Re: How Spark sql query optimisation work if we are using .rdd action ?

2016-08-14 Thread Mich Talebzadeh
There are two distinct parts here. Optimisation + execution. Spark does not have a Cost Based Optimizer (CBO) yet but that does not matter for now. When we do such operation say outer join between (s) and (t) DFs below, we see scala> val rs = s.join(t,s("time_id")===t("time_id"),

Re: How Spark sql query optimisation work if we are using .rdd action ?

2016-08-14 Thread ayan guha
I do not think so. What I understand Spark will still use Catalyst to join. DF always has an RDD underneath, but that does not mean any action will force less optimal path. On Sun, Aug 14, 2016 at 3:04 PM, mayur bhole wrote: > HI All, > > Lets say, we have > > val df =

How Spark sql query optimisation work if we are using .rdd action ?

2016-08-13 Thread mayur bhole
HI All, Lets say, we have val df = bigTableA.join(bigTableB,bigTableA("A")===bigTableB("A"),"left") val rddFromDF = df.rdd println(rddFromDF.count) My understanding is that spark will convert all data frame operations before "rddFromDF.count" into RDD equivalent operation as we are not

Re: Spark SQL query for List

2016-04-26 Thread Ramkumar V
I'm getting following exception if i form a query like this. Its not coming to the point where get(0) or get(1). Exception in thread "main" java.lang.RuntimeException: [1.22] failure: ``*'' expected but `cities' found *Thanks*, On Tue, Apr 26, 2016 at

Re: Spark SQL query for List

2016-04-26 Thread Hyukjin Kwon
Doesn't get(0) give you the Array[String] for CITY (am I missing something?) On 26 Apr 2016 11:02 p.m., "Ramkumar V" wrote: JavaSparkContext ctx = new JavaSparkContext(sparkConf); SQLContext sqlContext = new SQLContext(ctx); DataFrame parquetFile =

Re: Spark SQL query for List

2016-04-26 Thread Ramkumar V
JavaSparkContext ctx = new JavaSparkContext(sparkConf); SQLContext sqlContext = new SQLContext(ctx); DataFrame parquetFile = sqlContext.parquetFile( "hdfs:/XYZ:8020/user/hdfs/parquet/*.parquet"); parquetFile.registerTempTable("parquetFile"); DataFrame tempDF =

Re: Spark SQL query for List

2016-04-26 Thread Hyukjin Kwon
Could you maybe share your codes? On 26 Apr 2016 9:51 p.m., "Ramkumar V" wrote: > Hi, > > I had loaded JSON file in parquet format into SparkSQL. I can't able to > read List which is inside JSON. > > Sample JSON > > { > "TOUR" : { > "CITIES" :

Spark SQL query for List

2016-04-26 Thread Ramkumar V
Hi, I had loaded JSON file in parquet format into SparkSQL. I can't able to read List which is inside JSON. Sample JSON { "TOUR" : { "CITIES" : ["Paris","Berlin","Prague"] }, "BUDJET" : 100 } I want to read value of CITIES. *Thanks*,

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread Mich Talebzadeh
Spark sql has both FLOOR and CEILING functions spark-sql> select FLOOR(11.95),CEILING(11.95); 11.012.0 Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread Ajay Chander
Hi Ashok, Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot have that functionality. Let me know if it works. Thanks, Ajay On Friday, March 4, 2016, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi Ayan, > > Thanks for the response. I am using SQL

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread ashokkumar rajendran
Hi Ayan, Thanks for the response. I am using SQL query (not Dataframe). Could you please explain how I should import this sql function to it? Simply importing this class to my driver code does not help here. Many functions that I need are already there in the sql.functions so I do not want to

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread ayan guha
Most likely you are missing import of org.apache.spark.sql.functions. In any case, you can write your own function for floor and use it as UDF. On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi, > > I load json file that has timestamp (as long

Facing issue with floor function in spark SQL query

2016-03-04 Thread ashokkumar rajendran
Hi, I load json file that has timestamp (as long in milliseconds) and several other attributes. I would like to group them by 5 minutes and store them as separate file. I am facing couple of problems here.. 1. Using Floor function at select clause (to bucket by 5mins) gives me error saying

Re: Spark sql query taking long time

2016-03-03 Thread Gourav Sengupta
Hi, using dataframes you can use SQL, and SQL has an option of JOIN, BETWEEN, IN and LIKE OPERATIONS. Why would someone use a dataframe and then use them as RDD's? :) Regards, Gourav Sengupta On Thu, Mar 3, 2016 at 4:28 PM, Sumedh Wale wrote: > On Thursday 03 March 2016

Re: Spark sql query taking long time

2016-03-03 Thread Sumedh Wale
On Thursday 03 March 2016 09:15 PM, Gourav Sengupta wrote: Hi, why not read the table into a dataframe directly using SPARK CSV package. You are trying to solve the problem the

Re: Spark sql query taking long time

2016-03-03 Thread Gourav Sengupta
Hi, why not read the table into a dataframe directly using SPARK CSV package. You are trying to solve the problem the round about way. Regards, Gourav Sengupta On Thu, Mar 3, 2016 at 12:33 PM, Sumedh Wale wrote: > On Thursday 03 March 2016 11:03 AM, Angel Angel wrote: >

Re: Spark sql query taking long time

2016-03-03 Thread Sumedh Wale
On Thursday 03 March 2016 11:03 AM, Angel Angel wrote: Hello Sir/Madam, I am writing one application using spark sql. i made the vary big table using the following command  val

Re: Spark sql query taking long time

2016-03-02 Thread Ted Yu
Have you seen the thread 'Filter on a column having multiple values' where Michael gave this example ? https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/107522969592/2840265927289860/2388bac36e.html FYI On Wed, Mar 2, 2016 at

Spark sql query taking long time

2016-03-02 Thread Angel Angel
Hello Sir/Madam, I am writing one application using spark sql. i made the vary big table using the following command *val dfCustomers1 = sc.textFile("/root/Desktop/database.txt").map(_.split(",")).map(p => Customer1(p(0),p(1).trim.toInt, p(2).trim.toInt, p(3)))toDF* Now i want to search the

Spark SQL query AVRO file

2015-08-07 Thread java8964
Hi, Spark users: We currently are using Spark 1.2.2 + Hive 0.12 + Hadoop 2.2.0 on our production cluster, which has 42 data/task nodes. There is one dataset stored as Avro files about 3T. Our business has a complex query running for the dataset, which is stored in nest structure with Array of

RE: Spark SQL query AVRO file

2015-08-07 Thread java8964
...@databricks.com Date: Fri, 7 Aug 2015 11:32:21 -0700 Subject: Re: Spark SQL query AVRO file To: java8...@hotmail.com CC: user@spark.apache.org Have you considered trying Spark SQL's native support for avro data? https://github.com/databricks/spark-avro On Fri, Aug 7, 2015 at 11:30 AM, java8964 java8

Re: Spark SQL query AVRO file

2015-08-07 Thread Michael Armbrust
-- From: mich...@databricks.com Date: Fri, 7 Aug 2015 11:32:21 -0700 Subject: Re: Spark SQL query AVRO file To: java8...@hotmail.com CC: user@spark.apache.org Have you considered trying Spark SQL's native support for avro data? https://github.com/databricks/spark-avro On Fri, Aug 7, 2015

RE: Spark SQL query AVRO file

2015-08-07 Thread java8964
Good to know that. Let me research it and give it a try. Thanks Yong From: mich...@databricks.com Date: Fri, 7 Aug 2015 11:44:48 -0700 Subject: Re: Spark SQL query AVRO file To: java8...@hotmail.com CC: user@spark.apache.org You can register your data as a table using this library and then query

Re: Help optimising Spark SQL query

2015-06-30 Thread James Aley
the following Spark SQL query: select count(*) as uses, count (distinct cast(id as string)) as users from usage_events where from_unixtime(cast(timestamp_millis/1000 as bigint)) between '2015-06-09' and '2015-06-16' The table contains billions of rows, but totals only 64GB of data

Re: Help optimising Spark SQL query

2015-06-23 Thread Sabarish Sasidharan
query. It is also not necessary in your case. Getting rid of casts in the whole query will be also beneficial. Le lun. 22 juin 2015 à 17:29, James Aley james.a...@swiftkey.com a écrit : Hello, A colleague of mine ran the following Spark SQL query: select count(*) as uses, count

Re: Help optimising Spark SQL query

2015-06-23 Thread James Aley
query will be also beneficial. Le lun. 22 juin 2015 à 17:29, James Aley james.a...@swiftkey.com a écrit : Hello, A colleague of mine ran the following Spark SQL query: select count(*) as uses, count (distinct cast(id as string)) as users from usage_events where from_unixtime(cast

Help optimising Spark SQL query

2015-06-22 Thread James Aley
Hello, A colleague of mine ran the following Spark SQL query: select count(*) as uses, count (distinct cast(id as string)) as users from usage_events where from_unixtime(cast(timestamp_millis/1000 as bigint)) between '2015-06-09' and '2015-06-16' The table contains billions of rows

RE: Help optimising Spark SQL query

2015-06-22 Thread Matthew Johnson
this issue, so might be worth upgrading if you are not already on 1.4. Cheers, Matthew *From:* Lior Chaga [mailto:lio...@taboola.com] *Sent:* 22 June 2015 17:24 *To:* James Aley *Cc:* user *Subject:* Re: Help optimising Spark SQL query Hi James, There are a few configurations that you

Re: Help optimising Spark SQL query

2015-06-22 Thread Lior Chaga
be an enormous performance improvement in dataframes. Lior On Mon, Jun 22, 2015 at 6:28 PM, James Aley james.a...@swiftkey.com wrote: Hello, A colleague of mine ran the following Spark SQL query: select count(*) as uses, count (distinct cast(id as string)) as users from usage_events

Re: Help optimising Spark SQL query

2015-06-22 Thread James Aley
are not already on 1.4. Cheers, Matthew *From:* Lior Chaga [mailto:lio...@taboola.com] *Sent:* 22 June 2015 17:24 *To:* James Aley *Cc:* user *Subject:* Re: Help optimising Spark SQL query Hi James, There are a few configurations that you can try: https://spark.apache.org/docs/latest

Re: Help optimising Spark SQL query

2015-06-22 Thread Ntale Lukama
ran the following Spark SQL query: select count(*) as uses, count (distinct cast(id as string)) as users from usage_events where from_unixtime(cast(timestamp_millis/1000 as bigint)) between '2015-06-09' and '2015-06-16' The table contains billions of rows, but totals only 64GB

Re: Help optimising Spark SQL query

2015-06-22 Thread Yin Huai
not necessary in your case. Getting rid of casts in the whole query will be also beneficial. Le lun. 22 juin 2015 à 17:29, James Aley james.a...@swiftkey.com a écrit : Hello, A colleague of mine ran the following Spark SQL query: select count(*) as uses, count (distinct cast(id as string

  1   2   >