So the LogisticRegression with regParam and elasticNetParam set to 0 is not
what you are looking for?
https://spark.apache.org/docs/2.3.0/ml-classification-regression.html#logistic-regression
.setRegParam(0.0)
.setElasticNetParam(0.0)
Am Do., 11. Okt. 2018 um 15:46 Uhr schrieb pikufolgado
Hi,
I would like to carry out a classic logistic regression analysis. In other
words, without using penalised regression ("glmnet" in R). I have read the
documentation and am not able to find this kind of models.
Is it possible to estimate this? In R the name of the function is "glm".
Best
We have a Spark Structured Streaming job which runs out of disk quota after
some days.
The primary reason is there are bunch of empty folders that are getting
created in the /work/tmp directory.
Any idea how to prune them?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Hi there,
is there any best practice guideline on yarn resource overcommit with cpu /
vcores, such as yarn config options, candidate cases ideal for
overcommiting vcores etc.?
this slide below (from 2016) seems to address the memory overcommit topic
and hint a "future" topic on cpu overcommit:
Hi Joel
I built such pipeline to transform pdf-> text
https://github.com/EDS-APHP/SparkPdfExtractor
You can take a look
It transforms 20M pdfs in 2 hours on a 5 node spark cluster
Le 2018-10-10 23:56, Joel D a écrit :
> Hi,
>
> I need to process millions of PDFs in hdfs using spark. First I’m
I believe your use case can be better covered with an own data source reading
PDF files.
On Big Data platforms in general you have the issue that individual PDF files
are very small and are a lot of them - this is not very efficient for those
platforms. That could be also one source of your