eleases/spark-release-3-5-1.html
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>> Jungtaek Lim
>>
>> ps. Yikun is helping us through releasing the official docker image for
>> Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available.
>>
>>
--
John Zhuge
https://github.com/apache/arrow-datafusion-comet for more details if
>> you are interested. We'd love to collaborate with people from the open
>> source community who share similar goals.
>>
>> Thanks,
>> Chao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
--
John Zhuge
Hi,
This is currently my column definition :
Employee ID NameClient Project Team01/01/2022 02/01/2022
03/01/2022 04/01/2022 05/01/2022
12345 Dummy x Dummy a abc team a OFF WO WH WH
WH
As you can see, the outer columns are just
ct has no attribute 'read_excel'. Can
you advise?
JOHN PAUL JAYME
Data Engineer
[https://app.tdcx.com/email-signature/assets/img/tdcx-logo.png]
m. +639055716384 w. www.tdcx.com<http://www.tdcx.com/>
Winner of over 350 Industry Awards
[Linkedin]<https://www.linkedin.com/company/tdcxgr
to reduce the madness..
Regards;
John Crowe
TDi Technologies, Inc.
1600 10th Street Suite B
Plano, TX 75074
(800) 695-1258
supp...@tditechnologies.com<mailto:supp...@tditechnologies.com>
From: Sean Owen
Sent: Wednesday, January 12, 2022 10:23 AM
To: Crowe, John
Cc: user@spark.apac
I too would like to know when you anticipate Spark 3.3.0 to be released due to
the Log4j CVE’s.
Our customers are all quite concerned.
Regards;
John Crowe
TDi Technologies, Inc.
1600 10th Street Suite B
Plano, TX 75074
(800) 695-1258
supp...@tditechnologies.com<mailto:s
n Karau
>>>> wrote:
>>>>
>>>>> Hi Folks,
>>>>>
>>>>> I'm continuing my adventures to make Spark on containers party and I
>>>>> was wondering if folks have experience with the different batch
>>>>> scheduler options that they prefer? I was thinking so that we can
>>>>> better support dynamic allocation it might make sense for us to
>>>>> support using different schedulers and I wanted to see if there are
>>>>> any that the community is more interested in?
>>>>>
>>>>> I know that one of the Spark on Kube operators supports
>>>>> volcano/kube-batch so I was thinking that might be a place I start
>>>>> exploring but also want to be open to other schedulers that folks
>>>>> might be interested in.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Holden :)
>>>>>
>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9
>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>
>>>>> -
>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>
>>>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>> --
John Zhuge
Dear Spark team members,
Can you please advise if Column-level encryption is available in Spark SQL?
I am aware that HIVE supports column level encryption.
Appreciate your response.
Thanks,
John
y available.
>
> Kindly provide some insight on this.
>
>
> Paras
> 9130006036
>
--
John
age
> 25.0 (TID 35067, localhost, executor driver)
> : org.apache.hadoop.hdfs.BlockMissingException
> : Could not obtain block:
> BP-1742911633-10.225.201.50-1479296658503:blk_1233169822_159765693
>
> ```
>
> Please can anyone help me with how to handle such exception in pyspark.
>
> --
> Best Regards
> *Divay Jindal*
>
>
>
--
John
Hello all,
I have to read data from Kafka topic at regular intervals. I create the
dataframe as shown below. I don’t want to start reading from the beginning on
each run. At the same time, I don’t want to miss the messages between run
intervals.
val queryDf = sqlContext
.read
ine (where the app is being submitted from).
>
> On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge <john.zh...@gmail.com> wrote:
> > Thanks Jacek and Marcelo!
> >
> > Any reason it is not sourced? Any security consideration?
> >
> >
> > On Wed, Jan 3, 2018 at 9:59 A
Thanks Jacek and Marcelo!
Any reason it is not sourced? Any security consideration?
On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin <van...@cloudera.com> wrote:
> On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <jzh...@apache.org> wrote:
> > I am running Spark 2.0.0 and 2.1.
lustermode. See the YARN-related Spark Properties
> <https://github.com/apache/spark/blob/master/docs/running-on-yarn.html#spark-properties>
> for
> more information.
Does it mean spark-env.sh will not be sourced when starting AM in cluster
mode?
Does this paragraph appy to executor as well?
Thanks,
--
John Zhuge
Hello TD,
You had replied to one of the questions about checkpointing –
This is an unfortunate design on my part when I was building DStreams :)
Fortunately, we learnt from our mistakes and built Structured Streaming the
correct way. Checkpointing in Structured Streaming stores only the
/tmp/logs/root/logs/application_1501197841826_0013does not exist.
Log aggregation has not completed or is not enabled.
Any other way to see my logs?
Thanks
John
From: ayan guha <guha.a...@gmail.com>
Sent: Sunday, July 30, 2017 10:34 PM
To: John Zeng; Ri
. But where are they?
Thanks
John
From: Riccardo Ferrari <ferra...@gmail.com>
Sent: Saturday, July 29, 2017 8:18 PM
To: johnzengspark
Cc: User
Subject: Re: Logging in RDD mapToPair of Java Spark application
Hi John,
The reason you don't see the second
I followed the instructions for configuring a custom logger per
https://spark.apache.org/docs/2.0.2/running-on-yarn.html (because we have
long running spark jobs, sometimes occasionally get stuck and without a
rolling file appender will fill up disk). This seems to work well for us,
but it breaks
Am I doing something wrong here? Why is the temp stuff owned by root? Is
there a bug in saving things due to this ownership?
John
Exception:
Py4JJavaError: An error occurred while calling o338.save.
: org.apache.hadoop.security.AccessControlException: User jomernik(user id
101) does has been den
No problem. It was a big headache for my team as well. One of us already
reimplemented it from scratch, as seen in this pending PR for our project.
https://github.com/hail-is/hail/pull/1895
Hopefully you find that useful. We'll hopefully try to PR that into Spark at
some point.
Best,
John
Hey Anthony,
You're the first person besides myself I've seen mention this. BlockMatrix
multiply is not the best method. As far as me and my team can tell, the memory
problem stems from the fact that when Spark tries to compute block (i, j) of
the matrix, it tries to manifest all of row i from
I have tried the configuration calculator sheet provided by Cloudera as
well but no improvements. However, ignoring the 17 mil operation to begin
with.
Let consider the simple sort on yarn and spark which has tremendous
difference.
The operation is simple Selected numeric col to be sorted
at MapR? Usually the system guys target snapshots, volumes,
and posix compliance if they are bought into Isilon.
Good luck Mich.
Regards,
John Leach
> On Jun 5, 2017, at 9:27 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
>
> Hi John,
>
> Thanks. Did you
point.
Regards,
John Leach
> On Jun 5, 2017, at 9:11 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
>
> I am concerned about the use case of tools like Isilon or Panasas to create a
> layer on top of HDFS, essentially a HCFS on top of HDFS with the usual 3x
Spark is doing operations on each partition in parallel. If you decrease number
of partitions, you’re potentially doing less work in parallel depending on your
cluster setup.
> On May 23, 2017, at 4:23 PM, Andrii Biletskyi
> wrote:
>
>
> No, I didn't
size? Does anyone have any
suggestions? I’ve tried throwing 900 cores at a 100k by 100k matrix multiply
with 1000 by 1000 sized blocks, and that seemed to hang forever and eventually
fail.
Thanks ,
John
-
To unsubscribe e-mail
Hi Folks,
The Apache Gora team are pleased to announce the immediate availability of
Apache Gora 0.7.
The Apache Gora open source framework provides an in-memory data model and
persistence for big data. Gora supports persisting to column stores, key
value stores, document stores and RDBMSs, and
transmits the data from each of the JVMs over the network. This
>> seems like overkill though.
>>
>> Is there a simpler solution for getting this data into a DataFrame?
>>
>> Thanks,
>> John
>>
>>
>>
>> --
>
the spark version is 2.1.0
--发件人:方孝健(玄弟)
发送时间:2017年2月10日(星期五) 12:35收件人:spark-dev
; spark-user 主 题:Driver hung and
happend out of memory while writing to
[Stage 172:==> (10328 + 93) / 16144]
[Stage 172:==> (10329 + 93) / 16144]
[Stage 172:==> (10330 + 93) / 16144]
[Stage 172:==>
My spark main thread create some daemon threads which maybe timer thread. Then
the spark application throw some exceptions, and the main thread will quit. But
the jvm of driver don't crash for standalone cluster. Of course the question
don't happen at yarn cluster. Because the application
My spark main thread create some daemon thread. Then the spark application
throw some exceptions, and the main thread will quit. But the jvm of driver
don't crash, so How can i do?
for example:
val sparkConf = new SparkConf().setAppName("NetworkWordCount")
I hope I can get the application by the driverId, but I don't find the rest api
at spark。Then how can i get the application, which belong to one driver。
As we know, each standaone cluster has itself UI. Then we will have more than
one UI if we have many standalone cluster. How can I only have a UI which can
access different standaone clusters?
./start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to
/home/admin/koala/data/versions/0/SPARK/2.0.2/spark-2.0.2-bin-hadoop2.6/logs/spark-admin-org.apache.spark.deploy.history.HistoryServer-1-v069166214.sqa.zmf.out
Then the history will print all log to the
The source is DirectKafkaInputDStream which can ensure the exectly-once of the
consumer side. But I have a question based the following code。As we known, the
"graph.generateJobs(time)" will create rdds and generate jobs。And the source
RDD is KafkaRDD which contain the offsetRange。 The jobs are
1. If a task complete the operation, it will notify driver. The driver may not
receive the message due to the network, and think the task is still running.
Then the child stage won't be scheduled ?
2. how do spark guarantee the downstream-task can receive the shuffle-data
completely. As fact, I
would
be very much appreciated.
Thanks! :)
--John
One can specify "-Dlog4j.configuration=" or
"-Dlog4j.configuration=".
Is there any preference to using one over other?
All the spark documentation talks about using "log4j.properties" only (
http://spark.apache.org/docs/latest/configuration.html#configuring-logging).
So is only "log4j.properties"
file.
> >>>
> >>> Is this even possible or the only way to use R is as part of RStudio
> >>> orchestration of our Spark cluster?
> >>>
> >>>
> >>>
> >>> Thanks for the help!
> >>>
> >>>
>
t;
>>> I want to use R code as part of spark application (the same way I would
>>> do with Scala/Python). I want to be able to run an R syntax as a map
>>> function on a big Spark dataframe loaded from a parquet file.
>>>
>>> Is this even possible o
Thanks Saurabh!
That explode function looks like it is exactly what I need.
We will be using MLlib quite a lot - Do I have to worry about python
versions for that?
John
On Wed, Jun 22, 2016 at 4:34 PM, Saurabh Sardeshpande <saurabh...@gmail.com>
wrote:
> Hi John,
>
> If you ca
? Particularly how to split
one row into multiples.
Lastly, I am a bit hesitant to ask but is there a recommendation on which
version of python to use? Not interested in which is better, just want to
know if they are both supported equally.
I am using Spark 1.6.1 (Hortonworks distro).
Thanks!
John
Hi Mich,
batch interval is 10 seconds, and I don't use sliding window.
Typical message count per batch is ~100k.
--
John Simon
On Fri, Jun 10, 2016 at 10:31 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:
> Hi John,
>
> I did not notice anything unusual in you
Sorry, forgot to mention that I don't use broadcast variables. That's
why I'm puzzled here.
--
John Simon
On Thu, Jun 9, 2016 at 11:09 AM, John Simon <john.si...@tapjoy.com> wrote:
> Hi,
>
> I'm running Spark Streaming with Kafka Direct Stream, batch interval
> is 10 second
user.country US
user.dir /home/hadoop
user.home /home/hadoop
user.language en
user.name hadoop
user.timezone UTC
```
--
John Simon
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user
Have you had a look at this issue?
https://issues.apache.org/jira/browse/SPARK-12279
There is a comment by Y Bodnar on how they successfully got Kerberos and
HBase working.
2016-05-18 18:13 GMT+10:00 :
> Hi all,
>
> I have been puzzling over a Kerberos
You would want to add a listener to your Spark Streaming context. Have a
look at the StatsReportListener [1].
[1]
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.scheduler.StatsReportListener
2016-05-17 7:18 GMT+10:00 Samuel Zhou :
> Hi,
>
If you are wanting to share RDDs it might be a good idea to check out
Tachyon / Alluxio.
For the Thrift server, I believe the datasets are located in your Spark
cluster as RDDs and you just communicate with it via the Thrift
JDBC Distributed Query Engine connector.
2016-05-17 5:12 GMT+10:00
]
https://github.com/apache/oozie/blob/master/sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java
John
2016-05-16 2:33 GMT+10:00 Stephen Boesch <java...@gmail.com>:
>
> There is a committed PR from Marcelo Vanzin addressing that capability:
>
> https://githu
You could handle null values by using the DataFrame.na functions in a
preprocessing step like DataFrame.na.fill().
For reference:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameNaFunctions
John
On 21 April 2016 at 03:41, Andres Perez <and...@tresata.
am hoping I have some options for net=bridge.
Thoughts?
John
umes, and the port maps, allowing us to edit the command
(have the default be "what works" and if we edit in such a way, it's our
own fault) could give us the freedom to do things like this... does this
capability exist?
Thanks,
John
of the physical node in bridged mode, it doesn't see it and errors
out... as stated we need a bind address, and advertise address if this is
to work), 2. Same restrictions. 3. cluster mode doesn't work for pyspark
shell.
Any other thoughts?
John
On Thu, Jun 11, 2015 at 12:09 AM, Ashwin Shankar
result in a really big/heavy docker images in order to achieve this.So
that got me thinking about the HTTP API, and was wondering if there is JIRA
to track this or if this is something Spark is planning.
Thanks!
John
JSON as a first class citizen eventually, but it is still aways
off yet.
Any guidance would be sincerely appreciated!
Thanks!
John
We have almost zero node info – just an identifying integer.
John Lilley
From: Alexis Roos [mailto:alexis.r...@gmail.com]
Sent: Friday, March 11, 2016 11:24 AM
To: Alexander Pivovarov <apivova...@gmail.com>
Cc: John Lilley <john.lil...@redpoint.net>; Ovidiu-Cristian MARCU
<ovi
to run our software on
1bn edges.
John Lilley
From: Alexander Pivovarov [mailto:apivova...@gmail.com]
Sent: Friday, March 11, 2016 11:13 AM
To: John Lilley <john.lil...@redpoint.net>
Cc: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr>; lihu
<lihu...@gmail.com>;
currentGroupSize++;
}
if (currentGroupSize >= groupSize) {
currentGroupSize = 0;
currentEdge += 2;
} else {
currentEdge++;
}
}
}
}
John Lilley
Chief Architect, RedPoint Global Inc.
T: +1 303 541 1516 | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jl
currentGroupSize++;
}
if (currentGroupSize >= groupSize) {
currentGroupSize = 0;
currentEdge += 2;
} else {
currentEdge++;
}
}
}
}
John Lilley
Chief Architect, RedPoint Global Inc.
T: +1 303 541 1516 | M: +1 720 938 5761
. It degrades gracefully along the O(N^2) curve and
additional memory reduces time.
John Lilley
From: Ovidiu-Cristian MARCU [mailto:ovidiu-cristian.ma...@inria.fr]
Sent: Friday, March 11, 2016 8:14 AM
To: John Lilley <john.lil...@redpoint.net>
Cc: lihu <lihu...@gmail.com>; Andrew
A colleague did the experiments and I don’t know exactly how he observed that.
I think it was indirect from the Spark diagnostics indicating the amount of I/O
he deduced that this was RDD serialization. Also when he added light
compression to RDD serialization this improved matters.
John
would get failures. By
contrast, we have a C++ algorithm that solves 1bn edges using memory+disk on a
single 16GB node in about an hour. I think that a very large cluster will do
better, but we did not explore that.
John Lilley
Chief Architect, RedPoint Global Inc.
T: +1 303 541 1516 | M: +1 720
All, I received this today, is this appropriate list use? Note: This was
unsolicited.
Thanks
John
From: Pierce Lamb <pl...@snappydata.io>
11:57 AM (1 hour ago)
to me
Hi John,
I saw you on the Spark Mailing List and noticed you worked for * and
wanted to reach out. My company, Snap
Hi Folks,
!!Apologies for cross posting!!
The Apache Nutch PMC are pleased to announce the immediate release of
Apache Nutch v2.3.1, we advise all current users and developers of the 2.X
series to upgrade to this release.
Nutch is a well matured, production ready Web crawler. Nutch 2.X branch
I have used Spark 1.4 for 6 months. Thanks all the members of this
community for your great work.I have a question about the logging issue. I
hope this question can be solved.
The program is running under this configurations: YARN Cluster, YARN-client
mode.
In Scala,writing a code
015 at 5:33 PM, David John <david_john_2...@outlook.com> wrote:
I have used Spark 1.4 for 6 months. Thanks all the members of this
community for your great work.I have a question about the logging issue. I
hope this question can be solved.
The program is running under this configurati
c) Instead of having
to repackage an tgz for each app, it would just propagate...am I looking at
this wrong?
John
behaivior, and if it's something that might point to a
bug or if it's just classic uninitiated user error :)
John
NPE in Fine Grained Mode:
15/11/12 13:52:00 INFO storage.DiskBlockManager: Created local directory at
/tmp/blockmgr-94b6962b-2c28-4c10-946c-bd3b5c8c8069
15/11/12 13:52:00 INFO stor
the right protocol buffer
class across stage boundaries?
-John
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
is reached? Are there tuning
parameters that optimize for data all fitting in memory vs. data that must
spill?
Thanks,
John Lilley
From: Igor Berman [mailto:igor.ber...@gmail.com]
Sent: Saturday, October 10, 2015 12:06 PM
To: John Lilley <john.lil...@redpoint.net>
Cc: user@spark.apache.org;
happens when the data set exceed memory,
does it spill to disk "nicely" or degrade catastrophically?
Thanks,
John Lilley
!
John
I have a happy healthy Mesos cluster (0.24) running in my lab. I've
compiled spark-1.5.0 and it seems to be working fine, except for one small
issue, my tasks all seem to run on one node. (I have 6 in the cluster).
Basically, I have directory of compressed text files. Compressed, these 25
files
thout tons of admin overhead, so I really want to
explore.
Thanks!
John
Hi All,
The Apache Gora team are pleased to announce the immediate availability of
Apache Gora 0.6.1.
What is Gora?
Gora is a framework which provides an in-memory data model and persistence
for big data. Gora supports persisting to column stores, key value stores,
document stores and RDBMSs,
The answer is that my table was not serialized by kyro,but I started
spark-sql shell with kyro,so the data could not be deserialized。
--
View this message in context:
'SparkUI' failed after
16 retries!
Thanks
Joji john
and use that.
Thanks
Joji John
From: Ajay Singal asinga...@gmail.com
Sent: Friday, July 24, 2015 6:59 AM
To: Joji John
Cc: user@spark.apache.org
Subject: Re: ERROR SparkUI: Failed to bind SparkUI java.net.BindException:
Address already in use: Service 'SparkUI
extra hard to first locate
states and then emit all userid-state pairs.
How should I be doing this?
Thanks,
-John
My spark-sql command:
spark-sql --driver-memory 2g --master spark://hadoop04.xx.xx.com:8241 --conf
spark.driver.cores=20 --conf spark.cores.max=20 --conf
spark.executor.memory=2g --conf spark.driver.memory=2g --conf
spark.akka.frameSize=500 --conf spark.eventLog.enabled=true --conf
/:
java.lang.IllegalArgumentException (port out of range:1315905645) [duplicate 7]
Traceback (most recent call last):
File stdin, line 1, in module
15/06/19 11:49:44 INFO cluster.YarnScheduler: Removed TaskSet 39.0, whose tasks
have all completed, from pool
File /home/john/spark-1.4.0/python/pyspark/rdd.py
.
a=sc.parallelize([(16646160,1)])
b=stuff.map(lambda x:(16646160,1))
#b=sc.parallelize(b.collect())
a.join(b).take(10)
It still breaks. (Here again including the comment line fixes the problem.)
So I'm apparently looking at some sort of spark/pyspark bug. Spark 1.2.0.
Any idea?
-John
in their code, but I think someone (with more
knowledge than I) should probably look into this on Spark as well due it
appearing to have changed behavior between versions.
Thoughts?
John
Previous Post
All -
I am facing and odd issue and I am not really sure where to go for support
at this point. I am
that
this is happening at is way above my head. :)
On Fri, Jun 5, 2015 at 4:38 PM, John Omernik j...@omernik.com wrote:
Thanks all. The answers post is me too, I multi thread. That and Ted is
aware to and Mapr is helping me with it. I shall report the answer of that
investigation when we
?)
If there are any good docs on this, I'd love to understand it more.
Thanks!
John
Example
def parseLine(line):
restr = ^(\w\w\w ?\d\d? \d\d:\d\d:\d\d) ([^ ]+)
logre = re.compile(restr)
m = logre.search(line[1]) # Why does every record of he RDD have a NONE
value
yuzhih...@gmail.com wrote:
John:
Which Spark release are you using ?
As of 1.4.0, RDD has this method:
def isEmpty(): Boolean = withScope {
FYI
On Fri, Jun 5, 2015 at 9:01 AM, Evo Eftimov evo.efti...@isecc.com wrote:
Foreachpartition callback is provided with Iterator by the Spark
://twitter.com/deanwampler
http://polyglotprogramming.com
On Mon, Jun 1, 2015 at 2:49 PM, John Omernik j...@omernik.com
javascript:_e(%7B%7D,'cvml','j...@omernik.com'); wrote:
All -
I am facing and odd issue and I am not really sure where to go for
support at this point. I am running MapR
Is there pythonic/sparkonic way to test for an empty RDD before using the
foreachRDD? Basically I am using the Python example
https://spark.apache.org/docs/latest/streaming-programming-guide.html to
put records somewhere When I have data, it works fine, when I don't I
get an exception. I am not
be
appreciated.
Thanks!
John
On Mon, Jun 1, 2015 at 6:14 PM, Dean Wampler deanwamp...@gmail.com wrote:
It would be nice to see the code for MapR FS Java API, but my google foo
failed me (assuming it's open source)...
So, shooting in the dark ;) there are a few things I would check, if you
perplexed
on the change from 1.2.0 to 1.3.1.
Thank you,
John
Full Error on 1.3.1 on Mesos:
15/05/19 09:31:26 INFO MemoryStore: MemoryStore started with capacity
1060.3 MB java.lang.NullPointerException at
com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:96
perfectly
fine without any native (non-Java) libraries installed at all?
Thanks for the help,
John
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/dependencies-on-java-netlib-and-jblas-tp22818.html
Sent from the Apache Spark User List mailing list archive
There are some techniques you can use If you geohash
http://en.wikipedia.org/wiki/Geohash the lat-lngs. They will naturally be
sorted by proximity (with some edge cases so watch out). If you go the join
route, either by trimming the lat-lngs or geohashing them, you’re essentially
grouping
Maybe try including the jar with
--driver-class-path jar
On Feb 26, 2015, at 12:16 PM, Akshat Aranya aara...@gmail.com wrote:
My guess would be that you are packaging too many things in your job, which
is causing problems with the classpath. When your jar goes in first, you get
the
perspective. I will grant, that I am coming from a
traditional background, so some of the older ideas for how to set
things up may be creeping into my thinking, but if that's the case,
I'd love to understand better.
Thanks1
John
though, if side projects are
spinning up to support this, why not make this a feature of the main
project or is it just that esoteric that it's not important for the
main project to be looking into it?
On Tue, Feb 24, 2015 at 9:25 AM, Chip Senkbeil chip.senkb...@gmail.com wrote:
Hi John
sort of time frame I could possibly
communicate to my team? Anything I can do?
Thanks!
On Fri, Feb 20, 2015 at 4:36 AM, Iulian Dragoș
iulian.dra...@typesafe.com wrote:
On Thu, Feb 19, 2015 at 2:49 PM, John Omernik j...@omernik.com wrote:
I am running Spark on Mesos and it works quite well
was not clear on those options. If anyone could point me in the right
direction, I would greatly appreciate it!
John
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Hi Folks,
Apologies for cross posting :(
As some of you may already know, @ApacheCon NA 2015 is happening in Austin,
TX April 13th-16th.
This email is specifically written to attract all folks interested in
Science and Healthcare... this is an official call to arms! I am aware that
there are
1 - 100 of 147 matches
Mail list logo