.
Best Regards,
Jerry
Sent from my iPad
On Jul 24, 2014, at 6:53 AM, Sameer Sayyed sam.sayyed...@gmail.com wrote:
Hello All,
I am new user of spark, I am using cloudera-quickstart-vm-5.0.0-0-vmware for
execute sample examples of Spark.
I am very sorry for silly and basic question.
I am
org.apache.spark.sql.hive.*;
Let me know what I'm doing wrong.
Thanks,
Jerry
the spark shell, so all I do is Test.run(sc) in shell.
Let me know what to look for to debug this problem. I'm not sure where to
look to solve this problem.
Thanks,
Jerry
By the way, if Hive is present in the Spark install, does show up in text
when you start the spark shell? Any commands I can run to check if it
exists? I didn't setup the spark machine that I use, so I don't know what's
present or absent.
Thanks,
Jerry
On Mon, Aug 10, 2015 at 2:38 PM
So it seems like dataframes aren't going give me a break and just work. Now
it evaluates but goes nuts if it runs into a null case OR doesn't know how
to get the correct data type when I specify the default value as a string
expression. Let me know if anyone has a work around to this. PLEASE HELP
those links
point me to something useful. Let me know if you can run the above code/
what you did different to get that code to run.
Thanks,
Jerry
On Fri, Aug 14, 2015 at 1:23 PM, Salih Oztop soz...@yahoo.com wrote:
Hi Jerry,
This blog post is perfect for window functions in Spark.
https
)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
On Fri, Aug 14, 2015 at 1:39 PM, Jerry jerry.c...@gmail.com wrote:
Hi Salih,
Normally I do sort before
Thanks!
On Wed, Aug 26, 2015 at 2:06 PM, Marcelo Vanzin van...@cloudera.com wrote:
On Wed, Aug 26, 2015 at 2:03 PM, Jerry jerry.c...@gmail.com wrote:
Assuming your submitting the job from terminal; when main() is called,
if I
try to open a file locally, can I assume the machine is always
.
The json messages are coming from Kafka consumer. It's over 1,500 messages
per second. So the message processing (parser and write to Cassandra) is
also need to be completed at the same time (1,500/second).
Thanks in advance.
Jerry
I appreciate it if you can give me any helps and advice.
time)
But the Cassandra can only be inserted about 100 messages in each round of
test.
Can anybody give me advices why the other messages (about 900 message) can't
be consumed?
How do I configure and tune the parameters in order to improve the
throughput of consumers?
Thank you very much fo
Rado,
Yes. you are correct. A lots of messages are created almost in the same
time (even use milliseconds). I changed to use "UUID.randomUUID()" with
which all messages can be inserted in the Cassandra table without time lag.
Thank you very much!
Jerry Wong
On Wed, Feb 17, 2016
Hi David,
Thank you for your response.
Before inserting to Cassandra, I had checked the data have already missed
at HDFS (My second step is to load data from HDFS and then insert to
Cassandra).
Can you send me the link relating this bug of 0.8.2?
Thank you!
Jerry
On Thu, May 5, 2016 at 12:38
and confirmed the
same number in the Broker. But when I checked either HDFS or Cassandra, the
number is just 363. The data is not always lost, just sometimes... That's
wired and annoying to me.
Can anybody give me some reasons?
Thanks!
Jerry
--
View this message in context:
http://apache-spark-user
Hi Shark,
Should I assume that Shark users should not use the shark APIs since there
are no documentations for it? If there are documentations, can you point it
out?
Best Regards,
Jerry
On Thu, Apr 3, 2014 at 9:24 PM, Jerry Lam chiling...@gmail.com wrote:
Hello everyone,
I have
it with spark,
I don't think you can get a lot of performance from scanning HBase unless
you are talking about caching the results from HBase in spark and reuse it
over and over.
HTH,
Jerry
On Wed, Apr 9, 2014 at 12:02 PM, David Quigley dquigle...@gmail.com wrote:
Hi all,
We are currently using hbase
Hi Spark users,
Do you guys plan to go the spark summit? Can you recommend any hotel near
the conference? I'm not familiar with the area.
Thanks!
Jerry
Hi guys,
I ended up reserving a room at the Phoenix (Hotel:
http://www.jdvhotels.com/hotels/california/san-francisco-hotels/phoenix-hotel)
recommended by my friend who has been in SF.
According to Google, it takes 11min to walk to the conference which is not
too bad.
Hope this helps!
Jerry
the error you
saw. By reducing the number of cores, there are more cpu resources
available to a task so the GC could finish before the error gets throw.
HTH,
Jerry
On Tue, Jul 8, 2014 at 1:35 PM, Aaron Davidson ilike...@gmail.com wrote:
There is a difference from actual GC overhead, which can
+1 as well for being able to submit jobs programmatically without using
shell script.
we also experience issues of submitting jobs programmatically without using
spark-submit. In fact, even in the Hadoop World, I rarely used hadoop jar
to submit jobs in shell.
On Wed, Jul 9, 2014 at 9:47 AM,
that defines how my application
should look like. In my humble opinion, using Spark as embeddable library
rather than main framework and runtime is much easier.
On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam chiling...@gmail.com wrote:
+1 as well for being able to submit jobs programmatically without
or it is a bug in spark sql?
Best Regards,
Jerry
issue?
For the curious mind, the dataset is about 200-300GB and we are using 10
machines for this benchmark. Given the env is equal between the two
experiments, why pure spark is faster than SparkSQL?
Best Regards,
Jerry
By the way, I also try hql(select * from m).count. It is terribly slow
too.
On Thu, Jul 10, 2014 at 5:08 PM, Jerry Lam chiling...@gmail.com wrote:
Hi Spark users and developers,
I'm doing some simple benchmarks with my team and we found out a potential
performance issue using Hive via
Hi Spark users,
Also, to put the performance issue into perspective, we also ran the query
on Hive. It took about 5 minutes to run.
Best Regards,
Jerry
On Thu, Jul 10, 2014 at 5:10 PM, Jerry Lam chiling...@gmail.com wrote:
By the way, I also try hql(select * from m).count. It is terribly
overhead, then there must be something additional
that SparkSQL adds to the overall overheads that Hive doesn't have.
Best Regards,
Jerry
On Thu, Jul 10, 2014 at 7:11 PM, Michael Armbrust mich...@databricks.com
wrote:
On Thu, Jul 10, 2014 at 2:08 PM, Jerry Lam chiling...@gmail.com wrote
[], (MetastoreRelation test, m, None), None
HiveTableScan [id#106], (MetastoreRelation test, s, Some(s)), None
Best Regards,
Jerry
On Thu, Jul 10, 2014 at 7:16 PM, Michael Armbrust mich...@databricks.com
wrote:
Hi Jerry,
Thanks for reporting this. It would be helpful if you could
of spark, but maybe not.
HTH,
Jerry
On Mon, Jul 14, 2014 at 3:09 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
You currently can't use SparkContext inside a Spark task, so in this case
you'd have to call some kind of local K-means library. One example you can
try to use is Weka (http
Then yarn application -kill appid should work. This is what I did 2 hours ago.
Sorry I cannot provide more help.
Sent from my iPhone
On 14 Jul, 2014, at 6:05 pm, hsy...@gmail.com hsy...@gmail.com wrote:
yarn-cluster
On Mon, Jul 14, 2014 at 2:44 PM, Jerry Lam chiling...@gmail.com wrote
Hi Rajesh,
can you describe your spark cluster setup? I saw localhost:2181 for
zookeeper.
Best Regards,
Jerry
On Tue, Jul 15, 2014 at 9:47 AM, Madabhattula Rajesh Kumar
mrajaf...@gmail.com wrote:
Hi Team,
Could you please help me to resolve the issue.
*Issue *: I'm not able to connect
. uber jar it and run it just like any other simple
java program. If you still have connection issues, then at least you know
the problem is from the configurations.
HTH,
Jerry
On Tue, Jul 15, 2014 at 12:10 PM, Krishna Sankar ksanka...@gmail.com
wrote:
One vector to check is the HBase libraries
://issues.apache.org/jira/browse/SPARK-2483 seems to
address only HiveQL.
Best Regards,
Jerry
On Tue, Jul 15, 2014 at 3:38 AM, anyweil wei...@gmail.com wrote:
Thank you so much for the information, now i have merge the fix of #1411
and
seems the HiveSQL works with:
SELECT name FROM people WHERE
. --jars A.jar,B.jar,C.jar not --jars A.jar, B.jar, C.jar
I'm just guessing because when I used --jars I never have spaces in it.
HTH,
Jerry
On Wed, Jul 16, 2014 at 5:30 AM, Madabhattula Rajesh Kumar
mrajaf...@gmail.com wrote:
Hi Team,
Now i've changed my code and reading configuration from
java.lang.RuntimeException: [1.57] failure:
``('' expected but identifier myudf found
I also tried returning a List of Ints, that did not work either. Is
there a way to write a UDF that returns a list?
Thanks
-Jerry
-
To unsubscribe, e
Hi,
If I create a SchemaRDD from a file that I know is sorted on a certain
field, is it possible to somehow pass that information on to Spark SQL
so that SQL queries referencing that field are optimized?
Thanks
-Jerry
with name = apple with
early stopping.
Is this possible? If yes, how one implements the contain function?
Best Regards,
Jerry
in which I can do that. The farthest I can get to
is to convert items.toSeq. The type information I got back is:
scala items.toSeq
res57: Seq[Any] = [WrappedArray([1,orange],[2,apple])]
Any suggestion?
Best Regards,
Jerry
Hi Mark,
Thank you for helping out.
The items I got back from Spark SQL has the type information as follows:
scala items
res16: org.apache.spark.sql.Row = [WrappedArray([1,orange],[2,apple])]
I tried to iterate the items as you suggested but no luck.
Best Regards,
Jerry
On Mon, Dec 15
.user_id == t2.user_id)
nor
t1.join(t2, on = Some('t1.user_id == t2.user_id))
work, or even compile. I could not find any examples of how to perform a
join using the DSL. Any pointers will be appreciated :)
Thanks
-Jerry
Another problem with the DSL:
t1.where('term == dmin).count() returns zero. But
sqlCtx.sql(select * from t1 where term = 'dmin').count() returns 700,
which I know is correct from the data. Is there something wrong with how
I'm using the DSL?
Thanks
On 17/12/14 11:13 am, Jerry Raj wrote
Hi spark users,
Do you know how to read json files using Spark SQL that are LZO compressed?
I'm looking into sqlContext.jsonFile but I don't know how to configure it
to read lzo files.
Best Regards,
Jerry
Hi Ted,
Thanks for your help.
I'm able to read lzo files using sparkContext.newAPIHadoopFile but I
couldn't do the same for sqlContext because sqlContext.josnFile does not
provide ways to configure the input file format. Do you know if there are
some APIs to do that?
Best Regards,
Jerry
On Wed
)
In some scenarios, Hadoop is faster because it is saving one stage. Did I
do something wrong?
Best Regards,
Jerry
On Wed, Dec 17, 2014 at 1:29 PM, Michael Armbrust mich...@databricks.com
wrote:
You can create an RDD[String] using whatever method and pass that to
jsonRDD.
On Wed, Dec 17, 2014
Hi Spark users,
I wonder if val resultRDD = RDDA.union(RDDB) will always have records in
RDDA before records in RDDB.
Also, will resultRDD.coalesce(1) change this ordering?
Best Regards,
Jerry
Hi Sean and Madhu,
Thank you for the explanation. I really appreciate it.
Best Regards,
Jerry
On Fri, Dec 19, 2014 at 4:50 AM, Sean Owen so...@cloudera.com wrote:
coalesce actually changes the number of partitions. Unless the
original RDD had just 1 partition, coalesce(1) will make an RDD
Michael,
Thanks. Is this still turned off in the released 1.2? Is it possible to
turn it on just to get an idea of how much of a difference it makes?
-Jerry
On 05/12/14 12:40 am, Michael Armbrust wrote:
I'll add that some of our data formats will actual infer this sort of
useful information
)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:382)
Is this supported?
Best Regards,
Jerry
.
Is this something already possible with spark/tachyon? If not, do you think
it is possible? Does anyone mind to share their experience in capturing the
data lineage in a data processing pipeline?
Best Regards,
Jerry
. However, I didn't use the
spark-csv package though. I did that manually so I cannot comment on the
spark-csv.
HTH,
Jerry
On Thu, Feb 5, 2015 at 9:32 AM, Spico Florin spicoflo...@gmail.com wrote:
Hello!
I'm using spark-csv 2.10 with Java from the maven repository
groupIdcom.databricks/groupId
Hi guys,
Does this issue affect 1.2.0 only or all previous releases as well?
Best Regards,
Jerry
On Thu, Jan 8, 2015 at 1:40 AM, Xuelin Cao xuelincao2...@gmail.com wrote:
Yes, the problem is, I've turned the flag on.
One possible reason for this is, the parquet file supports predicate
not affiliate
with Cloudera but it seems they are the only one who is very active in the
spark project and provides a hadoop distribution.
HTH,
Jerry
btw, who is Paco Nathan?
On Thu, Jan 22, 2015 at 10:03 AM, Babu, Prashanth
prashanth.b...@nttdata.com wrote:
Sudipta,
Use the Docker image
Hi Deep,
what do you mean by stuck?
Jerry
On Mon, Feb 2, 2015 at 12:44 AM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
Hi,
Is there any better operation than Union. I am using union and the cluster
is getting stuck with a large data set.
Thank you
Hi Deep,
How do you know the cluster is not responsive because of Union?
Did you check the spark web console?
Best Regards,
Jerry
On Mon, Feb 2, 2015 at 1:21 AM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
The cluster hangs.
On Mon, Feb 2, 2015 at 11:25 AM, Jerry Lam chiling
objects.
I'm thinking of overriding the saveAsParquetFile method to allows me to
persist the avro schema inside parquet. Is this possible at all?
Best Regards,
Jerry
On Fri, Jan 9, 2015 at 2:05 AM, Raghavendra Pandey
raghavendra.pan...@gmail.com wrote:
I cam across this
http://zenfractal.com
wasn't that bad at all. If it is not indexed,
I expect it to take much longer time.
Can IndexedRDD be sorted by keys as well?
Best Regards,
Jerry
On Tue, Jan 13, 2015 at 11:06 AM, Andrew Ash and...@andrewash.com wrote:
Hi Jem,
Linear time in scaling on the big table doesn't seem
that. Is there
another API that allows me to do this?
Best Regards,
Jerry
is in
comparisons to Flink is one of the immediate questions I have. It would be
great if they have the benchmark software available somewhere for other
people to experiment.
just my 2 cents,
Jerry
On Sun, Jul 5, 2015 at 4:35 PM, Ted Yu yuzhih...@gmail.com wrote:
There was no mentioning
Hi Guru,
Thanks! Great to hear that someone tried it in production. How do you like
it so far?
Best Regards,
Jerry
On Tue, Aug 18, 2015 at 11:38 AM, Guru Medasani gdm...@gmail.com wrote:
Hi Jerry,
Yes. I’ve seen customers using this in production for data science work.
I’m currently
Hi.
I want to parse a file and return a key-value pair with pySpark, but
result is strange to me.
the test.sql is a big fie and each line is usename and password, with
# between them, I use below mapper2 to map data, and in my
understanding, i in words.take(10) should be a tuple, but the result
is
Hi Prabeesh,
That's even better!
Thanks for sharing
Jerry
On Tue, Aug 18, 2015 at 1:31 PM, Prabeesh K. prabsma...@gmail.com wrote:
Refer this post
http://blog.prabeeshk.com/blog/2015/06/19/pyspark-notebook-with-docker/
Spark + Jupyter + Docker
On 18 August 2015 at 21:29, Jerry Lam
cannot do this.
Other solutions (e.g. Zeppelin) seem to reinvent the wheel that IPython has
already offered years ago. It would be great if someone can educate me the
reason behind this.
Best Regards,
Jerry
into server:
/etc/httpd/modules/mod_authz_core.so: cannot open shared object file: No
such file or directory
[FAILED]
Best Regards,
Jerry
On Mon, Aug 17, 2015 at 11:09 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
Howdy folks!
I’m interested in hearing about what people think of spark-ec2
,
Jerry
on.
Thank you for your help!
Jerry
On Thu, Jul 30, 2015 at 11:10 AM, Ted Yu yuzhih...@gmail.com wrote:
The files were dated 16-Jul-2015
Looks like nightly build either was not published, or published at a
different location.
You can download spark-1.5.0-SNAPSHOT.tgz and binary-search
My experience with Mesos + Spark is not great. I saw one executor with 30 CPU
and the other executor with 6. So I don't think you can easily configure it
without some tweaking at the source code.
Sent from my iPad
On 2015-08-11, at 2:38, Haripriya Ayyalasomayajula aharipriy...@gmail.com
Just out of curiosity, what is the advantage of using parquet without hadoop?
Sent from my iPhone
On 11 Aug, 2015, at 11:12 am, saif.a.ell...@wellsfargo.com wrote:
I confirm that it works,
I was just having this issue: https://issues.apache.org/jira/browse/SPARK-8450
Saif
From:
before?
Best Regards,
Jerry
Hi Akshat,
Is there a particular reason you don't use s3a? From my experience,s3a performs
much better than the rest. I believe the inefficiency is from the
implementation of the s3 interface.
Best Regards,
Jerry
Sent from my iPhone
On 9 Aug, 2015, at 5:48 am, Akhil Das ak
Great stuff Tim. This definitely will make Mesos users life easier
Sent from my iPad
On 2015-08-12, at 11:52, Haripriya Ayyalasomayajula aharipriy...@gmail.com
wrote:
Thanks Tim, Jerry.
On Wed, Aug 12, 2015 at 1:18 AM, Tim Chen t...@mesosphere.io wrote:
Yes the options
. The speed is 4x faster in
the data-without-mapping that means that the more columns a parquet file
has the slower it is even only a specific column is needed.
Anyone has an explanation on this? I was expecting both of them will finish
approximate the same time.
Best Regards,
Jerry
Hi guys,
I noticed that too. Anders, can you confirm that it works on Spark 1.5
snapshot? This is what I tried at the end. It seems it is 1.4 issue.
Best Regards,
Jerry
On Wed, Jul 22, 2015 at 11:46 AM, Anders Arpteg arp...@spotify.com wrote:
No, never really resolved the problem, except
similar style off-heap memory
mgmt, more planning optimizations
*From:* Jerry Lam [mailto:chiling...@gmail.com chiling...@gmail.com]
*Sent:* Sunday, July 5, 2015 6:28 PM
*To:* Ted Yu
*Cc:* Slim Baltagi; user
*Subject:* Re: Benchmark results between Flink and Spark
Hi guys,
I just read
You mean this does not work?
SELECT key, count(value) from table group by key
On Sun, Jul 19, 2015 at 2:28 PM, N B nb.nos...@gmail.com wrote:
Hello,
How do I go about performing the equivalent of the following SQL clause in
Spark Streaming? I will be using this on a Windowed DStream.
Yes.
Sent from my iPhone
On 19 Jul, 2015, at 10:52 pm, Jahagirdar, Madhu
madhu.jahagir...@philips.com wrote:
All,
Can we run different version of Spark using the same Mesos Dispatcher. For
example we can run drivers with Spark 1.3 and Spark 1.4 at the same time ?
Regards,
Madhu
Hi Nikunj,
Sorry, I totally misread your question.
I think you need to first groupbykey (get all values of the same key together),
then follow by mapValues (probably put the values into a set and then take the
size of it because you want a distinct count)
HTH,
Jerry
Sent from my iPhone
?
--
*From:* Jerry Lam [chiling...@gmail.com]
*Sent:* Monday, July 20, 2015 8:27 AM
*To:* Jahagirdar, Madhu
*Cc:* user; d...@spark.apache.org
*Subject:* Re: Spark Mesos Dispatcher
Yes.
Sent from my iPhone
On 19 Jul, 2015, at 10:52 pm, Jahagirdar, Madhu
madhu.jahagir...@philips.com wrote
mory which is
a bit odd in my opinion. Any help will be greatly appreciated.
Best Regards,
Jerry
On Sun, Oct 25, 2015 at 9:25 PM, Josh Rosen <rosenvi...@gmail.com> wrote:
> Hi Jerry,
>
> Do you have speculation enabled? A write which produces one million files
> / output pa
)
org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:31)
org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:395)
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:267)
On Sun, Oct 25, 2015 at 10:25 PM, Jerry Lam <chiling...@gmail.com> wrote:
> Hi Josh,
>
>
parameters to make it more memory efficient?
Best Regards,
Jerry
On Sun, Oct 25, 2015 at 8:39 PM, Jerry Lam <chiling...@gmail.com> wrote:
> Hi guys,
>
> After waiting for a day, it actually causes OOM on the spark driver. I
> configure the driver to have 6GB. Note that I didn't c
million
files. Not sure why it OOM the driver after the job is marked _SUCCESS in
the output folder.
Best Regards,
Jerry
On Sat, Oct 24, 2015 at 9:35 PM, Jerry Lam <chiling...@gmail.com> wrote:
> Hi Spark users and developers,
>
> Does anyone encounter any issue when a spark SQL job
I used the spark 1.3.1 to populate the event logs to Cassandra. But there
is an exception that I could not find out any clauses. Can anybody give me
any helps?
Exception in thread "main" java.lang.IllegalArgumentException: Positive
number of slices required
at
.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Any idea why it can read the schema from the parquet file but not
processing the file? It feels like the hadoop configuration is not sent to
the executor for some reasons...
Thanks,
Jerry
oad the parquet file but I cannot perform a count on the parquet file
because of the AmazonClientException. It means that the credential is used
during the loading of the parquet but not when we are processing the
parquet file. How this can happen?
Best Regards,
Jerry
On Tue, Oct 27, 2015 at 2:05 PM,
t;key",
"value") does not propagate through all SQL jobs within the same
SparkContext? I haven't try with Spark Core so I cannot tell.
Is there a workaround given it seems to be broken? I need to do this
programmatically after the SparkContext is instantiated not before...
Best Regards,
J
Hi Bryan,
Did you read the email I sent few days ago. There are more issues with
partitionBy down the road:
https://www.mail-archive.com/user@spark.apache.org/msg39512.html
<https://www.mail-archive.com/user@spark.apache.org/msg39512.html>
Best Regards,
Jerry
> On Oct 28, 2015, a
of partition is over 100.
Best Regards,
Jerry
Sent from my iPhone
> On 26 Oct, 2015, at 2:50 am, Fengdong Yu <fengdo...@everstring.com> wrote:
>
> How many partitions you generated?
> if Millions generated, then there is a huge memory consumed.
>
>
>
>
>
>&
?
Best Regards,
Jerry
. it takes awhile to
initialize the partition table and it requires a lot of memory from the driver.
I would not use it if the number of partition go over a few hundreds.
Hope this help,
Jerry
Sent from my iPhone
> On 28 Oct, 2015, at 6:33 pm, Bryan <bryan.jeff...@gmail.com> wrote:
&
Hi Ted,
That looks exactly what happens. It has been 5 hrs now. The code was built for
1.4. Thank you very much!
Best Regards,
Jerry
Sent from my iPhone
> On 14 Nov, 2015, at 11:21 pm, Ted Yu <yuzhih...@gmail.com> wrote:
>
> Which release are you using ?
> If older th
r. the max-date is likely
> to be faster though.
>
> On Sun, Nov 1, 2015 at 4:36 PM, Jerry Lam <chiling...@gmail.com> wrote:
>
>> Hi Koert,
>>
>> You should be able to see if it requires scanning the whole data by
>> "explain" the query. The physica
Hi Koert,
You should be able to see if it requires scanning the whole data by
"explain" the query. The physical plan should say something about it. I
wonder if you are trying the distinct-sort-by-limit approach or the
max-date approach?
Best Regards,
Jerry
On Sun, Nov 1, 2015
of the physical plan, you can navigate the actual
execution in the web UI to see how much data is actually read to satisfy
this request. I hope it only requires a few bytes for few dates.
Best Regards,
Jerry
On Sun, Nov 1, 2015 at 5:56 PM, Jerry Lam <chiling...@gmail.com> wrote:
> I agreed the
s actually works or
not. :)
Best Regards,
Jerry
On Sun, Nov 1, 2015 at 3:03 PM, Koert Kuipers <ko...@tresata.com> wrote:
> hello all,
> i am trying to get familiar with spark sql partitioning support.
>
> my data is partitioned by date, so like this:
> data/date=2015-01-01
>
spark.sql.hive.enabled false configuration would be lovely too. :)
Just an additional bonus is that it requires less memory if we don’t use
HiveContext on the driver side (~100-200MB) from a rough observation.
Thanks and have a nice weekend!
Jerry
> On Nov 6, 2015, at 5:53 PM, Ted Yu <yuzhih...@gma
. /home/jerry directory). It will give me an exception
like below.
Since I don’t use HiveContext, I don’t see the need to maintain a database.
What is interesting is that pyspark shell is able to start more than 1 session
at the same time. I wonder what pyspark has done better than spark-shell
We "used" Spark on Mesos to build interactive data analysis platform
because the interactive session could be long and might not use Spark for
the entire session. It is very wasteful of resources if we used the
coarse-grained mode because it keeps resource for the entire session.
Therefore,
Does Qubole use Yarn or Mesos for resource management?
Sent from my iPhone
> On 5 Nov, 2015, at 9:02 pm, Sabarish Sasidharan
> wrote:
>
> Qubole
-
To unsubscribe, e-mail:
)
at org.apache.derby.jdbc.Driver20.connect(Unknown Source)
at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
Best Regards,
Jerry
> On Nov 6, 2015, at 12:12 PM, Ted Yu <yuzhih...@gmail.com&
ply$mcZ$sp(SparkILoopExt.scala:127)
at
org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113)
at
org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113)
Best Regards,
Jerry
onfig of skipping the above call.
>
> FYI
>
> On Fri, Nov 6, 2015 at 8:53 AM, Jerry Lam <chiling...@gmail.com
> <mailto:chiling...@gmail.com>> wrote:
> Hi spark users and developers,
>
> Is it possible to disable HiveContext from being instantiated when usin
I'm interested in it but I doubt there is r-tree indexing support in the near
future as spark is not a database. You might have a better luck looking at
databases with spatial indexing support out of the box.
Cheers
Sent from my iPad
On 2015-10-18, at 17:16, Mustafa Elbehery
1 - 100 of 171 matches
Mail list logo