Hi Team,
I am working on structured streaming
i have added all libraries in build,sbt then also its not picking up right
library an failing with error
User class threw exception: java.lang.ClassNotFoundException: Failed to
find data source: kafka. Please find packages at
I am in CentOS 7 and I use Spark 2.3.0. Below I have posted my code. Logistic
regression took 85 minutes and linear regression 127 seconds…
My dataset as I said is 128 MB and contains: 1000 features and ~100 classes.
#SparkSession
ss = SparkSession.builder.getOrCreate()
start = time.time()
Hi, usin Spark 2.3 I read a image in dataset using imageschema. Now after
some changes, I want to save dataset as a new image. How can I achieve this
in Spark ?
I’m using spark 2.3 with schema merge set to false. I don’t think spark is
reading any file indeed but it tries to list them all one by one and it’s super
slow on s3 !
Pointing to a single partition manually is not an option as it requires me to
be aware of the partitioning in order to add it
Are you reformatting the data correctly for logistic (meaning 0 & 1's)
before modeling? What are OS and spark version you using?
Thank You,
Irving Duran
On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois wrote:
> Hello,
>
> I am running an experiment to test logistic and
Hello,
I am running an experiment to test logistic and linear regression on spark
using MLlib.
My dataset is only 128MB and something weird happens. Linear regression takes
about 127 seconds either with 1 or 500 iterations. On the other hand, logistic
regression most of the times does not
You can not change dynamically the number of cores per executor or cores
per task, but you can change the number of executors.
In one of my jobs I have something like this, so when I know that I don't
need more than 4 executors, I kill all other executors (assuming that they
don't hold any cached
Hi Donni,
Please check spark dynamic allocation and external shuffle service .
On Fri, 27 Apr 2018 at 2:52 AM, Donni Khan
wrote:
> Hi All,
>
> Is there any way to change the number of executors/cores during running
> Saprk Job.
> I have Spark Job containing two
What version of Spark you are using?
You can search "spark.sql.parquet.mergeSchema" on
https://spark.apache.org/docs/latest/sql-programming-guide.html
Starting from Spark 1.5, the default is already "false", which means Spark
shouldn't scan all the parquet files to generate the schema.
All,
I have the following methods in my scala code, currently executed on demand
val files = sc.binaryFiles ("file:///imocks/data/ocr/raw")
//Abive line takes all PDF files
files.map(myconveter(_)).count
myconverter signature:
def myconverter (
file: (String,
You can specify the first folder directly and read it
On Fri, 27 Apr 2018 at 9:42 pm, Walid LEZZAR wrote:
> Hi,
>
> I have a parquet on S3 partitioned by day. I have 2 years of data (->
> about 1000 partitions). With spark, when I just want to know the schema of
> this
Hi,
I have a parquet on S3 partitioned by day. I have 2 years of data (-> about
1000 partitions). With spark, when I just want to know the schema of this
parquet without even asking for a single row of data, spark tries to list
all the partitions and the nested partitions of the parquet. Which
Hi All,
Is there any way to change the number of executors/cores during running
Saprk Job.
I have Spark Job containing two tasks: First task need many executors to
run fastly. the second task has many input and output opeartions and
shuffling, so it needs few executors, otherwise it taks loong
13 matches
Mail list logo