subject:"Need help in SparkSQL"

Re: Need help with SparkSQL Query

2018-12-17 Thread Ramandeep Singh Nanda

You can use analytical functions in spark sql. Something like select * from (select id, row_number() over (partition by id order by timestamp ) as rn from root) where rn=1 On Mon, Dec 17, 2018 at 4:03 PM Nikhil Goyal wrote: > Hi guys, > > I have a dataframe of type Record (id: Long, timestamp:

Re: Need help with SparkSQL Query

2018-12-17 Thread Patrick McCarthy

Untested, but something like the below should work: from pyspark.sql import functions as F from pyspark.sql import window as W (record .withColumn('ts_rank', F.dense_rank().over(W.Window.orderBy('timestamp').partitionBy("id")) .filter(F.col('ts_rank')==1) .drop('ts_rank') ) On Mon, Dec 17,

Need help with SparkSQL Query

2018-12-17 Thread Nikhil Goyal

Hi guys, I have a dataframe of type Record (id: Long, timestamp: Long, isValid: Boolean, other metrics) Schema looks like this: root |-- id: long (nullable = true) |-- timestamp: long (nullable = true) |-- isValid: boolean (nullable = true) . I need to find the earliest valid record

Need help in SparkSQL

2015-07-22 Thread Jeetendra Gangele

HI All, I have data in MongoDb(few TBs) which I want to migrate to HDFS to do complex queries analysis on this data.Queries like AND queries involved multiple fields So my question in which which format I should store the data in HDFS so that processing will be fast for such kind of queries?

Re: Need help in SparkSQL

2015-07-22 Thread Jörn Franke

Can you provide an example of an and query ? If you do just look-up you should try Hbase/ phoenix, otherwise you can try orc with storage index and/or compression, but this depends on how your queries look like Le mer. 22 juil. 2015 à 14:48, Jeetendra Gangele gangele...@gmail.com a écrit : HI

Re: Need help in SparkSQL

2015-07-22 Thread Jörn Franke

I do not think you can put all your queries into the row key without duplicating the data for each query. However, this would be more last resort. Have you checked out phoenix for Hbase? This might suit your needs. It makes it much simpler, because it provided sql on top of Hbase. Nevertheless,

Re: Need help in SparkSQL

2015-07-22 Thread Jeetendra Gangele

Query will be something like that 1. how many users visited 1 BHK flat in last 1 hour in given particular area 2. how many visitor for flats in give area 3. list all user who bought given property in last 30 days Further it may go too complex involving multiple parameters in my query. The

RE: Need help in SparkSQL

2015-07-22 Thread Mohammed Guller

Parquet Mohammed From: Jeetendra Gangele [mailto:gangele...@gmail.com] Sent: Wednesday, July 22, 2015 5:48 AM To: user Subject: Need help in SparkSQL HI All, I have data in MongoDb(few TBs) which I want to migrate to HDFS to do complex queries analysis on this data.Queries like AND queries

Re: Need help with SparkSQL Query

Re: Need help with SparkSQL Query

Need help with SparkSQL Query

Need help in SparkSQL

Re: Need help in SparkSQL

Re: Need help in SparkSQL

Re: Need help in SparkSQL

RE: Need help in SparkSQL

8 matches

Site Navigation

Mail list logo

Footer information