Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-05 Thread ashokkumar rajendran
| | Using I/O Size 64 Kbytes for data pages. >| | | | | With LRU Buffer Replacement Strategy for data > pages. > > Total estimated I/O cost for statement 4 (at line 4): 2147483647. > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread ashokkumar rajendran
would be faster than the solution > that you suggest. Now the question is, how should Spark decide when to do > what? > > > Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> > www.snappydata.io > > On Thu, Mar 31, 2016 at 2:28 PM, ashokkumar rajendran &

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread ashokkumar rajendran
-- > > So this looks like a bug! > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

SPARK-13900 - Join with simple OR conditions take too long

2016-03-31 Thread ashokkumar rajendran
Hi, I have filed ticket SPARK-13900. There was an initial reply from a developer but did not get any reply on this. How can we do multiple hash joins together for OR conditions based joins? Could someone please guide on how can we fix this? Regards Ashok

Re: How to efficiently query a large table with multiple dimensional table?

2016-03-12 Thread ashokkumar rajendran
Any input on this? Does it have something to do with SQL engine parser / optimizer? Please help. Regards Ashok On Fri, Mar 11, 2016 at 3:22 PM, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi All, > > I have a large table with few billions of rows and have a v

How to efficiently query a large table with multiple dimensional table?

2016-03-11 Thread ashokkumar rajendran
Hi All, I have a large table with few billions of rows and have a very small table with 4 dimensional values. I would like to get rows that match any of these dimensions. For example, Select field1, field2 from A, B where A.dimension1 = B.dimension1 OR A.dimension2 = B.dimension2 OR A.dimension3

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread ashokkumar rajendran
to rewrite them. Regards Ashok On Fri, Mar 4, 2016 at 3:52 PM, ayan guha <guha.a...@gmail.com> wrote: > Most likely you are missing import of org.apache.spark.sql.functions. > > In any case, you can write your own function for floor and use it as UDF. > > On Fri, Mar 4, 2016 a

Facing issue with floor function in spark SQL query

2016-03-04 Thread ashokkumar rajendran
Hi, I load json file that has timestamp (as long in milliseconds) and several other attributes. I would like to group them by 5 minutes and store them as separate file. I am facing couple of problems here.. 1. Using Floor function at select clause (to bucket by 5mins) gives me error saying

Re: Do we need schema for Parquet files with Spark?

2016-03-04 Thread ashokkumar rajendran
arquet > figures out the right encoding to use for you. > > Xinh > > > On Mar 3, 2016, at 7:32 PM, ashokkumar rajendran < > ashokkumar.rajend...@gmail.com> wrote: > > > > Hi, > > > > I am exploring to use Apache Parquet with Spark SQL in our proj

Re: Do we need schema for Parquet files with Spark?

2016-03-03 Thread ashokkumar rajendran
but not developers. Regards Ashok On Fri, Mar 4, 2016 at 11:01 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you taken a look at https://parquet.apache.org/community/ ? > > On Thu, Mar 3, 2016 at 7:32 PM, ashokkumar rajendran < > ashokkumar.rajend...@gmail.com> wrote: > >> Hi, &

How to start spark streaming application with recent past timestamp for replay of old batches?

2016-02-21 Thread ashokkumar rajendran
Hi Folks, I am exploring spark for streaming from two sources (a) Kinesis and (b) HDFS for some of our use-cases. Since we maintain state gathered over last x hours in spark streaming, we would like to replay the data from last x hours as batches during deployment. I have gone through the Spark