I've been trying to obtain clarification on the terms of use regarding
repo.spark-packages.org. I emailed feedb...@spark-packages.org two weeks
ago, but have not heard back. Whom should I contact?
On Mon, Apr 26, 2021 at 8:13 AM Bo Zhang wrote:
> Hi Apache Spark users,
>
> As you might know,
One advantage of RDD's over DataFrames is that RDD's allow you to use your
own data types, whereas DataFrames are backed by RDD's of Record objects,
which are pretty flexible but don't give you much in the way of
compile-time type checking. If you have an RDD of case class elements or
JSON, then
Yes, I know, but it would be nice to be able to test things myself before I
push commits.
On Sun, Oct 25, 2015 at 3:50 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> If you have a pull request, Jenkins can test your change for you.
>
> FYI
>
> On Oct 25, 2015, at 12:43 PM, Richar
When I try to start up sbt for the Spark build, or if I try to import it
in IntelliJ IDEA as an sbt project, it fails with a "No such file or
directory" error when it attempts to "git clone" sbt-pom-reader into
.sbt/0.13/staging/some-sha1-hash.
If I manually create the expected directory before
Also, if I run the Maven build on Windows or Linux without setting
-DskipTests=true, it hangs indefinitely when it gets to
org.apache.spark.JavaAPISuite.
It's hard to test patches when the build doesn't work. :-/
On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggert <richard.egg...@gmail.com>
25, 2015 at 3:38 PM, Richard Eggert <richard.egg...@gmail.com>
wrote:
> When I try to start up sbt for the Spark build, or if I try to import it
> in IntelliJ IDEA as an sbt project, it fails with a "No such file or
> directory" error when it attempts to "git clone"
If you want to override the default partitioning behavior, you have to do
so in your code where you create each RDD. Different RDDs usually have
different numbers of partitions (except when one RDD is directly derived
from another without shuffling) because they usually have different sizes,
so
I think the problem may be that callUDF takes a DataType indicating the
return type of the UDF as its second argument.
On Oct 12, 2015 9:27 AM, "Umesh Kacha" wrote:
> Hi if you can help it would be great as I am stuck don't know how to
> remove compilation error in callUdf
It's the same as joining 2. Join two together, and then join the third one
to the result of that.
On Oct 11, 2015 2:57 PM, "Subhajit Purkayastha" wrote:
> Can I join 3 different RDDs together in a Spark SQL DF? I can find
> examples for 2 RDDs but not 3.
>
>
>
> Thanks
>
>
>
Do you need the HashMap for anything else besides writing out to a file? If
not, there is really no need to create one at all. You could just keep
everything as RDDs.
On Oct 10, 2015 11:31 AM, "kali.tumm...@gmail.com"
wrote:
> Got it ..., created hashmap and saved it to
quirement is to get latest records using a key i think hash map is a
> good choice for this task.
> As of now data comes from third party and we are not sure what's the
> latest record is so hash map is chosen.
> Is there anything better than hash map please let me know.
>
> Th
Since the Python API is built on top of the Scala implementation, its
performance can be at best roughly the same as that of the Scala API (as in
the case of DataFrames and SQL) and at worst several orders of magnitude
slower.
Likewise, since the a Scala implementation of new features
That should have read "a lot of neat tricks", not "a lot of nest tricks".
That's what I get for sending emails on my phone
On Oct 6, 2015 8:32 PM, "Richard Eggert" <richard.egg...@gmail.com> wrote:
> Since the Python API is built on top of the Scala
eate a Cassandra RDD
>
> 2) Cache this RDD
>
> 3) Map it to CSV
>
> 4) Coalesce(because I need a single output file)
>
> 5) Write to file on local file system
>
>
>
> This makes sense.
>
>
>
> Thanks,
>
>
>
> Chirag
>
If there's only one partition, by definition it will only be handled by one
executor. Repartition to divide the work up. Note that this will also
result in multiple output files, however. If you absolutely need them to
be combined into a single file, I suggest using the Unix/Linux 'cat'
command
In general, RDDs get partitioned automatically without programmer
intervention. You generally don't need to worry about them unless you need
to adjust the size/number of partitions or the partitioning scheme
according to the needs of your application. Partitions get redistributed
among nodes
Maybe it's just my phone, but I don't see any code.
On Sep 22, 2015 11:46 AM, "juljoin" wrote:
> Hello,
>
> I am trying to figure Spark out and I still have some problems with its
> speed, I can't figure them out. In short, I wrote two programs that loop
> through a
eaterThan(C, X). You then can
> programmatically convert C to a.c. Note that in the buildScan required
> columns would also have an extra column C you need to returned in the
> buildScan RDD.
>
>
> It looks complicated, but I think it would work.
>
>
> Thanks.
>
>
&g
I defined my own relation (extending BaseRelation) and implemented the
PrunedFilteredScan interface, but discovered that if the column referenced
in a WHERE = clause is a user-defined type or a field of a struct column,
then Spark SQL passes NO filters to the PrunedFilteredScan.buildScan
method,
Hmm... The count() method invokes this:
def runJob[T, U: ClassTag](rdd: RDD[T], func: Iterator[T] => U): Array[U] =
{
runJob(rdd, func, 0 until rdd.partitions.length)
}
It appears that you're running out of memory while trying to compute
(within the driver) the number of partitions that will
Parallel processing is what Spark was made for. Let it do its job. Spawning
your own threads independently of what Spark is doing seems like you'd just
be asking for trouble.
I think you can accomplish what you want by taking the cartesian product of
the data element RDD and the feature list RDD
Greetings,
I have recently started using Spark SQL and ran up against two rather odd
limitations related to UserDefinedTypes.
The first is that there appears to be no way to register a UserDefinefType
other than by adding the @SQLUserDefinedType annotation to the class being
mapped. This makes
concat and locate are available as of version 1.5.0, according to the
Scaladocs. For earlier versions of Spark, and for the operations that are
still not supported, it's pretty straightforward to define your own
UserDefinedFunctions in either Scala or Java (I don't know about other
languages).
23 matches
Mail list logo