Hi,
Is this guy a silly billy for comparing Apache Flink with Apache Spark ?
https://www.youtube.com/watch?v=sYlbD_OoHhs
Airbus makes more of the sky with Flink - Jesse Anderson & Hassene Ben Salem
Does Apache Spark tomcat hadoop spark support distributed as well as map re
DataFrame.
So no mapReduce , spark intelligently uses needed pieces from Hive and use
its own execution engine.
--Regards,
Lalit
On Wed, Jun 8, 2016 at 9:59 PM, Vikash Pareek <vikash.par...@infoobjects.com
> wrote:
> Himanshu,
>
> Spark doesn't use hive execution engine (Map Red
Himanshu,
Spark doesn't use hive execution engine (Map Reduce) to execute query. Spark
only reads the meta data from hive meta store db and executes the query
within Spark execution engine. This meta data is used by Spark's own SQL
execution engine (this includes components such as catalyst
the results to spark? In this case, might hive be using
map-reduce to execute the queries?
Please clarify this confusion. I have looked into the code seems like spark
is just fetching the data from hdfs. Please convince me otherwise.
Thanks
Best
--
View this message in context:
http://apache
Hello everyone, I am trying to compute the similarity between 550k objects
using the DIMSUM algorithm available in Spark 1.6.
The cluster runs on AWS Elastic Map Reduce and consists of 6 r3.2xlarge
instances (one master and five cores), having 8 vCPU and 61 GiB of RAM each.
My input data
Hi All,
I'm running Spark 1.4.1 on a 8 core machine with 16 GB RAM. I've a 500MB
CSV file with 10 columns and i'm need of separating it into multiple
CSV/Parquet files based on one of the fields in the CSV file. I've loaded
the CSV file using spark-csv and applied the below transformations. It
You can also look into https://spark.apache.org/docs/latest/tuning.html for
performance tuning.
Thanks
Best Regards
On Mon, Jun 15, 2015 at 10:28 PM, Rex X dnsr...@gmail.com wrote:
Thanks very much, Akhil.
That solved my problem.
Best,
Rex
On Mon, Jun 15, 2015 at 2:16 AM, Akhil Das
Something like this?
val huge_data = sc.textFile(/path/to/first.csv).map(x =
(x.split(\t)(1), x.split(\t)(0))
val gender_data = sc.textFile(/path/to/second.csv),map(x =
(x.split(\t)(0), x))
val joined_data = huge_data.join(gender_data)
joined_data.take(1000)
Its scala btw, python api should
To be concrete, say we have a folder with thousands of tab-delimited csv
files with following attributes format (each csv file is about 10GB):
idnameaddresscity...
1Mattadd1LA...
2Willadd2LA...
3Lucyadd3SF...
...
And we have a
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/map-reduce-only-with-disk-tp23102.html
http://apache-spark-user-list.1001560.n3.nabble.com/map-reduce-only-with-disk-tp23102.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
anyone know how can I force Spark to use only the disk when doing a
simple flatMap(..).groupByKey.reduce(_ + _) ? Thank you!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/map-reduce-only-with-disk-tp23102.html
http://apache-spark-user-list
Dear all,
Does anyone know how can I force Spark to use only the disk when doing a
simple flatMap(..).groupByKey.reduce(_ + _) ? Thank you!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/map-reduce-only-with-disk-tp23102.html
Sent from the Apache Spark
Hi,
I have JavaPairRDDString, ListInteger and as an example what I want to
get.
user_id
cat1
cat2
cat3
cat4
522
0
1
2
0
62
1
0
3
0
661
1
2
0
1
query : the users who have a number (except 0) in cat1 and cat3 column
answer: cat2 - 522,611 cat3-522,62 = user 522
How can I
Please take a look at https://issues.apache.org/jira/browse/PHOENIX-1815
On Mon, Apr 20, 2015 at 10:11 AM, Jeetendra Gangele gangele...@gmail.com
wrote:
Thanks for reply.
Does phoenix using inside Spark will be useful?
what is the best way to bring data from Hbase into Spark in terms
I think recommended use will be creating a dataframe using hbase as source.
Then you can run any SQL on that DF.
In 1.2 you can create a base rdd and then apply schema in the same manner
On 21 Apr 2015 03:12, Jeetendra Gangele gangele...@gmail.com wrote:
Thanks for reply.
Does phoenix using
HI All,
I am Querying Hbase and combining result and using in my spake job.
I am querying hbase using Hbase client api inside my spark job.
can anybody suggest me will Spark SQl will be fast enough and provide range
of queries?
Regards
Jeetendra
Thanks for reply.
Does phoenix using inside Spark will be useful?
what is the best way to bring data from Hbase into Spark in terms
performance of application?
Regards
Jeetendra
On 20 April 2015 at 20:49, Ted Yu yuzhih...@gmail.com wrote:
To my knowledge, Spark SQL currently doesn't provide
To my knowledge, Spark SQL currently doesn't provide range scan capability
against hbase.
Cheers
On Apr 20, 2015, at 7:54 AM, Jeetendra Gangele gangele...@gmail.com wrote:
HI All,
I am Querying Hbase and combining result and using in my spake job.
I am querying hbase using Hbase
') and move on to the
next
part.
Can anyone please explain me which solution is better?
Thank you very much,
Shlomi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/randomSplit-instead-of-a-huge-map-reduce-tp21744.html
Sent from the Apache Spark User
very much,
Shlomi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/randomSplit-instead-of-a-huge-map-reduce-tp21744.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
you very much,
Shlomi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/randomSplit-instead-of-a-huge-map-reduce-tp21744.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
that works in that stuff tell me if that problem can be
fixed?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/HBase-Thrift-API-Error-on-map-reduce-functions-tp21439.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
am missing out here?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7033.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
pairs?
Can this be done in a distributed manner, as this data set is going to have
a few million records?
Can we do this in map/reduce commands?
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-done-in-map-reduce-technique-in-parallel
Hi Cheng,
Thanks a lot. That solved my problem.
Thanks again for the quick response and solution.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html
Sent from the Apache Spark User List mailing
this in map/reduce commands?
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-done-in-map-reduce-technique-in-parallel-tp6905.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
It is possible if you use a cartesian product to produce all possible
pairs for each IP address and 2 stages of map-reduce:
- first by pairs of points to find the total of each pair and
- second by IP address to find the pair for each IP address with the
maximum count.
Oleg
On 4 June 2014
...@gmail.com
wrote:
It is possible if you use a cartesian product to produce all possible
pairs for each IP address and 2 stages of map-reduce:
- first by pairs of points to find the total of each pair and
- second by IP address to find the pair for each IP address with the
maximum count.
Oleg
on for all combinations.
This is where I get stuck. Please guide me on this.
Thanks Again.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7016.html
Sent from the Apache Spark User List mailing list archive
29 matches
Mail list logo