Hi,
I have a spark 1.6.2 app (tested previously in 2.0.0 as well). It is
requiring a ton of memory (1.5TB) for a small dataset (~500mb). The memory
usage seems to jump, when I loop through and inner join to make the dataset
12 times as wide. The app goes down during or after this loop, when I try
It seems like the best solution is to set: yarn.nodemanager.aux-services to
mapred_shuffle,spark_shuffle
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Hive-and-Spark-together-with-Dynamic-Resource-Allocation-tp27968p27978.html
Sent from the Apache
Hi,
My team has a cluster running HDP, with Hive and Spark. We setup spark to
use dynamic resource allocation, for benefits such as not having to hard
code the number of executors and to free resources after using. Everything
is running on YARN.
The problem is that for Spark 1.5.2 with dynamic
Hi,
I'm trying to implement a custom one hot encoder, since I want the output to
be a specific way, suitable to theano. Basically, it will give a new column
for each distinct member of the original features and have it set to 1 if
the observation contains the specific member of the distinct
I solved this by using a Window partitioned by 'id'. I used lead and lag to
create columns, which contained nulls in the places that I needed to delete,
in each fold. I then removed those rows with the nulls and my additional
columns.
--
View this message in context:
Hi,
I'm trying to implement a folding function in Spark, it takes an input k and
a data frame of ids and dates. k=1 will be just the data frame, k=2 will,
consist of the min and max date for each id once and the rest twice, k=3
will consist of min and max once, min+1 and max-1, twice and the rest
Hi,
I've been fighting with a strange situation today. I'm trying to add two
entries for each of the distinct rows of an account, except for the first
and last (by date). Here's an example of some of the code. I can't get the
subset to continue forward:
var acctIdList =
or would it be common practice to just retain the original categories in
another df?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Dense-Vectors-outputs-in-feature-engineering-tp27331p27337.html
Sent from the Apache Spark User List mailing list archive at
Thanks Disha, that worked out well. Can you point me to an example of how to
decode my feature vectors in the dataframe, back into their categories?
--
View this message in context:
Hi,
I'm trying to use the StringIndexer and OneHotEncoder, in order to vectorize
some of my features. Unfortunately, OneHotEncoder only returns sparse
vectors. I can't find a way, much less an efficient one, to convert the
columns generated by OneHotEncoder into dense vectors. I need this as I
Hi,
I'm trying to figure out how to work with R libraries in spark, properly.
I've googled and done some trial and error. The main error, I've been
running into is "cannot coerce class "structure("DataFrame", package =
"SparkR")" to a data.frame". I'm wondering if there is a way to use the R
11 matches
Mail list logo