[no subject]

2024-03-21 Thread Рамик И
Hi! I want to exucute code inside forEachBatch that will trigger regardless of whether there is data in the batch or not. val kafkaReadStream = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", broker) .option("subscribe", topicName) .option("startingOffsets",

[no subject]

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers! The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code EU 2024 are now open! We will be supporting Community over Code EU, Bratislava, Slovakia, June 3th - 5th, 2024. TAC exists

[no subject]

2023-08-23 Thread ayan guha
Unsubscribe-- Best Regards, Ayan Guha

[no subject]

2023-08-18 Thread Dipayan Dev
Unsubscribe -- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop * M.Tech (AI), IISc, Bangalore

[no subject]

2023-07-16 Thread Varun Shah
Hi Spark Community, I am trying to setup my forked apache/spark project locally by building and creating a package as mentioned here under Running Individual Tests . Here are the steps I have followed: >> .build/sbt # this

[no subject]

2022-12-13 Thread yixu2...@163.com
UNSUBSCRIBE yixu2...@163.com

[no subject]

2022-08-12 Thread GAURAV GUPTA
Unsubscribe

[no subject]

2022-07-29 Thread Milin Korath
unsubscribe [Impelsys] Impelsys Disclaimer The information contained in this message is intended for the addresseeonly and may contain classified information. If you are not the addressee,please delete this message and notify the sender; you should not copy or

[no subject]

2022-06-10 Thread Rodrigo
Hi Everyone, My Security team has raised concerns about the requirement for root group membership for Spark running on Kubernetes. Does anyone know the reasons for that requirement, how insecure it is, and any alternatives if at all? Thanks, Rodrigo

[no subject]

2022-04-02 Thread Sungwoo Park
Hi Spark users, We have published an article where we evaluate the performance of Spark 2.3.8 and Spark 3.2.1 (along with Hive 3). If interested, please see: https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/ --- SW

[no subject]

2022-02-24 Thread Luca Borin
Unsubscribe

[no subject]

2022-01-31 Thread Gaetano Fabiano
Unsubscribe Inviato da iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[no subject]

2022-01-31 Thread pduflot
unsubscribe

[no subject]

2021-11-18 Thread Sam Elamin
unsubscribe

[no subject]

2021-11-17 Thread 马殿军
unsubscribe

[no subject]

2021-11-17 Thread Fred Wang
unsubscribe

[no subject]

2021-11-12 Thread 河合亮 / KAWAI,RYOU
unsubscribe

[no subject]

2021-05-03 Thread Tianchen Zhang
Hi all, Currently the user-facing Catalog API doesn't support backup/restore metadata. Our customers are asking for such functionalities. Here is a usage example: 1. Read all metadata of one Spark cluster 2. Save them into a Parquet file on DFS 3. Read the Parquet file and restore all metadata in

[no subject]

2021-03-07 Thread Sandeep Varma
Unsubscribe Sandeep Varma Principal ZS Associates India Pvt. Ltd. World Trade Center, Tower 3, Kharadi, Pune 411014, Maharashtra, India T | +91 20 6739 5224 M | +91 97 6633 0103 www.zs.com ZS Impact where it matters. Notice: This

[no subject]

2020-12-16 Thread 张洪斌
Unsubscribe 发自网易邮箱大师

[no subject]

2020-05-24 Thread Vijaya Phanindra Sarma B

[no subject]

2020-04-28 Thread Zeming Yu
Unsubscribe Get Outlook for Android

[no subject]

2020-03-30 Thread Dima Pavlyshyn
Hello Apache Spark Support Team, I am writing Spark on Java now. I use Dataset API and I face with an issue, that I am doing something like that public Dataset> groupByKey(Dataset> consumers, Class kClass) { consumers.groupBy("_1").agg(collect_list(col("_2"))).printSchema(); return

[no subject]

2020-03-02 Thread lucas.wu

[no subject]

2020-03-01 Thread Hamish Whittal
Hi there, I have an hdfs directory with thousands of files. It seems that some of them - and I don't know which ones - have a problem with their schema and it's causing my Spark application to fail with this error: Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet column

[no subject]

2020-01-14 Thread @Sanjiv Singh
Regards Sanjiv Singh Mob : +1 571-599-5236

[no subject]

2020-01-14 Thread @Sanjiv Singh
Regards Sanjiv Singh Mob : +1 571-599-5236

[no subject]

2019-09-19 Thread Georg Heiler
Hi, How can I create an initial state by hands so that structured streaming files source only reads data which is semantically (i.e. using a file path lexicographically) greater than the minimum committed initial state? Details here:

[no subject]

2019-07-21 Thread Hieu Nguyen
Hi Spark communities, I just found out that in https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.fullOuterJoin, the documentation is "Perform a right outer join of self and other." It should be a full outer join, not a right outer join, as shown in the example and the

[no subject]

2019-06-06 Thread Shi Tyshanchn

[no subject]

2019-04-08 Thread Siddharth Reddy
unsubscribe

[no subject]

2019-03-29 Thread Daniel Sierra
unsubscribe

[no subject]

2019-03-13 Thread Anbazhagan Muthuramalingam
SUBSCRIBE

[no subject]

2019-03-05 Thread Shyam P
Hi All, I need to save a huge data frame as parquet file. As it is huge its taking several hours. To improve performance it is known I have to send it group wise. But when I do partition ( columns*) /groupBy(Columns*) , driver is spilling a lot of data and performance hits a lot again. So how

[no subject]

2019-02-13 Thread Kumar sp

[no subject]

2019-01-31 Thread Ahmed Abdulla
unsubscribe

[no subject]

2019-01-31 Thread Daniel O' Shaughnessy
unsubscribe

[no subject]

2019-01-30 Thread Daniel O' Shaughnessy
Unsubscribe

[no subject]

2018-12-19 Thread Daniel O' Shaughnessy
unsubscribe

[no subject]

2018-11-08 Thread JF Chen
I am working on a spark streaming application, and I want it to read configuration from mongodb every hour, where the batch interval is 10 minutes. Is it practicable? As I know spark streaming batch are related to the Dstream, how to implement this function which seems not related to dstream data?

[no subject]

2018-05-16 Thread Davide Brambilla
Hi all, we have a dataframe with 1000 partitions and we need to write the dataframe into a MySQL using this command: df.coalesce(20) df.write.jdbc(url=url, table=table, mode=mode, properties=properties) and we get this errors randomly

[no subject]

2018-05-02 Thread Filippo Balicchia

[no subject]

2017-10-22 Thread 梁义怀

[no subject]

2017-09-08 Thread PICARD Damien
Hi ! I'm facing a Classloader problem using Spark 1.5.1 I use javax.validation and hibernate validation annotations on some of my beans : @NotBlank @Valid private String attribute1 ; @Valid private String attribute2 ; When Spark tries to unmarshall these beans (after a remote RDD),

[no subject]

2017-08-07 Thread Sumit Saraswat
Unsubscribe

[no subject]

2017-06-07 Thread Patrik Medvedev
Hello guys, I need to execute hive queries on remote hive server from spark, but for some reasons i receive only column names(without data). Data available in table, i checked it via HUE and java jdbc connection. Here is my code example: val test = spark.read .option("url",

[no subject]

2017-05-26 Thread Anton Kravchenko
df.rdd.foreachPartition(convert_to_sas_single_partition) def convert_to_sas_single_partition(ipartition: Iterator[Row]): Unit = { for (irow <- ipartition) {

[no subject]

2017-03-09 Thread sathyanarayanan mudhaliyar
I am using spark streaming for a basic streaming movie count program. So I first I have mapped the year and movie name to a JavaPairRDD and I am using the reduceByKey cor counting the movie year wise. I am using cassandra for output, the spark streaming application is not stopping and the

[no subject]

2017-03-08 Thread sathyanarayanan mudhaliyar
code: directKafkaStream.foreachRDD(rdd -> { rdd.foreach(record -> { messages1.add(record._2); }); JavaRDD lines = sc.parallelize(messages1);

[no subject]

2017-03-08 Thread sathyanarayanan mudhaliyar
code: directKafkaStream.foreachRDD(rdd -> { rdd.foreach(record -> { messages1.add(record._2); }); JavaRDD lines = sc.parallelize(messages1);

[no subject]

2016-12-20 Thread satyajit vegesna
Hi All, PFB sample code , val df = spark.read.parquet() df.registerTempTable("df") val zip = df.select("zip_code").distinct().as[String].rdd def comp(zipcode:String):Unit={ val zipval = "SELECT * FROM df WHERE zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode) val data =

[no subject]

2016-12-06 Thread ayan guha
Hi We are generating some big model objects > hdfs dfs -du -h /myfolder 325 975 /myfolder/__ORCHMETA__ 1.7 M5.0 M/myfolder/model 185.3 K 555.9 K /myfolder/predict The issue I am facing while loading is Error in .jcall("com/oracle/obx/df/OBXSerializer", returnSig =

[no subject]

2016-11-28 Thread Didac Gil
Any suggestions for using something like OneHotEncoder and StringIndexer on an InputDStream? I could try to combine an Indexer based on a static parquet but I want to use the OneHotEncoder approach in Streaming data coming from a socket. Thanks! Dídac Gil de la Iglesia

[no subject]

2016-11-24 Thread Rostyslav Sotnychenko

[no subject]

2016-10-10 Thread Fei Hu
Hi All, I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version 1.5.0). I customized the Spark interpreter to use org.apache.spark.serializer.KryoSerializer as spark.serializer. And in the dependency I added Kyro-3.0.3 as following: com.esotericsoftware:kryo:3.0.3 When I

[no subject]

2016-10-06 Thread ayan guha
Hi Faced one issue: - Writing Hive Partitioned table using df.withColumn("partition_date",to_date(df["INTERVAL_DATE"])).write.partitionBy('partition_date').saveAsTable("sometable",mode="overwrite") - Data got written to HDFS fine. I can see the folders with partition names such as

[no subject]

2016-08-14 Thread Jestin Ma
Hi, I'm currently trying to perform an outer join between two DataFrames/Sets, one is ~150GB, one is about ~50 GB on a column, id. df1.id is skewed in that there are many 0's, the rest being unique IDs. df2.id is not skewed. If I filter df1.id != 0, then the join works well. If I don't, then the

[no subject]

2016-07-08 Thread tan shai
Hi, Can any one explain to me the class RangePartitioning " https://github.com/apache/spark/blob/d5911d1173fe0872f21cae6c47abf8ff479345a4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala " case class RangePartitioning(ordering: Seq[SortOrder],

[no subject]

2016-06-29 Thread pooja mehta
Hi, Want to add a metadata field to StructField case class in spark. case class StructField(name: String) And how to carry over the metadata in query execution.

[no subject]

2016-06-24 Thread Rama Perubotla
Unsubscribe

[no subject]

2016-06-10 Thread pooja mehta
Hi, How to use scala UDF with the help of Beeline client. With the help of spark shell, we register our UDF like this:- sqlcontext.udf.register(). What is the way to use UDF in beeline client. Thanks Pooja

[no subject]

2016-04-11 Thread Angel Angel
Hello, I am writing the one spark application, it runs well but takes long execution time can anyone help me to optimize my query to increase the processing speed. I am writing one application in which i have to construct the histogram and compare the histograms in order to find the final

[no subject]

2016-04-02 Thread Hemalatha A
Hello, As per Spark programming guide, it says "we should have 2-4 partitions for each CPU in your cluster.". In this case how does 1 CPU core process 2-4 partitions at the same time? Link - http://spark.apache.org/docs/latest/programming-guide.html (under Rdd section) Does it do context

Spark thrift issue 8659 (changing subject)

2016-03-23 Thread ayan guha
> > Hi All > > I found this issue listed in Spark Jira - > https://issues.apache.org/jira/browse/SPARK-8659 > > I would love to know if there are any roadmap for this? Maybe someone from > dev group can confirm? > > Thank you in advance > > Best > Ayan > >

[no subject]

2016-03-20 Thread Vinay Varma

[no subject]

2016-01-11 Thread Daniel Imberman
Hi all, I'm looking for a way to efficiently partition an RDD, but allow the same data to exists on multiple partitions. Lets say I have a key-value RDD with keys {1,2,3,4} I want to be able to to repartition the RDD so that so the partitions look like p1 = {1,2} p2 = {2,3} p3 = {3,4}

[no subject]

2016-01-08 Thread Suresh Thalamati

[no subject]

2015-12-04 Thread Sateesh Karuturi
user-sc.1449231970.fbaoamghkloiongfhbbg-sateesh.karuturi9= gmail@spark.apache.org

[no subject]

2015-11-26 Thread Dmitry Tolpeko

[no subject]

2015-11-19 Thread aman solanki
Hi All, I want to know how one can get historical data of jobs,stages,tasks etc of a running spark application. Please share the information regarding the same. Thanks, Aman Solanki

[no subject]

2015-10-15 Thread Lei Wu
Dear all, Like the design doc in SPARK-1 for Spark memory management, is there a design doc for Spark task scheduling details ? I'd really like to dive deep into the task scheduling module of Spark, thanks so much !

[no subject]

2015-10-15 Thread Anfernee Xu
Sorry, I have to re-send it again as I did not get the answer. Here's the problem I'm facing, I have a standalone java application which is periodically submit Spark jobs to my yarn cluster, btw I'm not using 'spark-submit' or 'org.apache.spark.launcher' to submit my jobs. These jobs are

[no subject]

2015-07-10 Thread satish chandra j
HI All, I have issues to make external jar available to Spark Shell I have used -jars options while starting Spark Shell to make these available when I give command Class.forName(org.postgresql.Driver it is not giving any error But when action operation is performed on RDD than I am getting

[no subject]

2015-07-07 Thread Anand Nalya
Hi, Suppose I have an RDD that is loaded from some file and then I also have a DStream that has data coming from some stream. I want to keep union some of the tuples from the DStream into my RDD. For this I can use something like this: var myRDD: RDD[(String, Long)] = sc.fromText...

[no subject]

2015-07-07 Thread 付雅丹
Hi, everyone! I've got key,value pair in form of LongWritable, Text, where I used the following code: SparkConf conf = new SparkConf().setAppName(MapReduceFileInput); JavaSparkContext sc = new JavaSparkContext(conf); Configuration confHadoop = new Configuration(); JavaPairRDDLongWritable,Text

[no subject]

2015-06-23 Thread ๏̯͡๏
I have a Spark job that has 7 stages. The first 3 stage complete and the fourth stage beings (joins two RDDs). This stage has multiple task failures all the below exception. Multiple tasks (100s) of them get the same exception with different hosts. How can all the host suddenly stop responding

[no subject]

2015-06-17 Thread Nipun Arora
Hi, Is there anyway in spark streaming to keep data across multiple micro-batches? Like in a HashMap or something? Can anyone make suggestions on how to keep data across iterations where each iteration is an RDD being processed in JavaDStream? This is especially the case when I am trying to

[no subject]

2015-06-11 Thread Wangfei (X)
Hi, all We use spark sql to insert data from a text table into a partitioning table and found that if we give more cores to executors the insert performance whold be worse. executors numtotal-executor-cores average time for insert task 3

[no subject]

2015-05-06 Thread anshu shukla
Exception with sample testing in Intellij IDE: Exception in thread main java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class at akka.util.Collections$EmptyImmutableSeq$.init(Collections.scala:15) at akka.util.Collections$EmptyImmutableSeq$.clinit(Collections.scala) at

[no subject]

2015-03-25 Thread Himanish Kushary
Hi, I have a RDD of pairs of strings like below : (A,B) (B,C) (C,D) (A,D) (E,F) (B,F) I need to transform/filter this into a RDD of pairs that does not repeat a string once it has been used once. So something like , (A,B) (C,D) (E,F) (B,C) is out because B has already ben used in (A,B), (A,D)

[no subject]

2015-03-23 Thread Udbhav Agarwal
Hi, I am querying hbase via Spark SQL with java APIs. Step -1 creating JavaPairRdd, then JavaRdd, then JavaSchemaRdd.applySchema objects. Step -2 sqlContext.sql(sql query). If am updating my hbase database between these two steps(by hbase shell in some other console) the query in step two is not

[no subject]

2015-03-16 Thread Hector
- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

[no subject]

2015-03-03 Thread shahab
I did an experiment with Hive and SQL context , I queried Cassandra using CassandraAwareSQLContext (a custom SQL context from Calliope) , then I registered the rdd as a temp table , next I tried to query it using HiveContext, but it seems that hive context can not see the registered table suing

[no subject]

2015-03-03 Thread Jianshi Huang
Hi, I got this error message: 15/03/03 10:22:41 ERROR OneForOneBlockFetcher: Failed while starting block fetches java.lang.RuntimeException: java.io.FileNotFoundException:

[no subject]

2015-02-18 Thread Luca Puggini

[no subject]

2015-01-17 Thread Kyounghyun Park
Hi, I'm running Spark 1.2 in yarn-client mode. (using Hadoop 2.6.0) On VirtualBox, I can run spark-shell --master yarn-client without any error However, on a physical machine, I got the following error. Does anyone know why this happens? Any help would be appreciated. Thanks, Kyounghyun

[no subject]

2015-01-14 Thread Jianguo Li
I am using Spark-1.1.1. When I used sbt test, I ran into the following exceptions. Any idea how to solve it? Thanks! I think somebody posted this question before, but no one seemed to have answered it. Could it be the version of io.netty I put in my build.sbt? I included an dependency

[no subject]

2015-01-10 Thread Krishna Sankar
Guys, registerTempTable(Employees) gives me the error Exception in thread main scala.ScalaReflectionException: class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with primordial classloader with boot classpath

[no subject]

2015-01-03 Thread Sujeevan
Best Regards, Sujeevan. N

[no subject]

2014-12-04 Thread Subong Kim
- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

[no subject]

2014-11-26 Thread rapelly kartheek
Hi, I've been fiddling with spark/*/storage/blockManagerMasterActor.getPeers() definition in the context of blockManagerMaster.askDriverWithReply() sending a request GetPeers(). 1) I couldn't understand what the 'selfIndex' is used for?. 2) Also, I tried modifying the 'peers' array by just

[no subject]

2014-10-22 Thread Margusja
unsubscribe - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

[no subject]

2014-09-30 Thread PENG ZANG
Hi, We have a cluster setup with spark 1.0.2 running 4 workers and 1 master with 64G RAM for each. In the sparkContext we specify 32G executor memory. However, as long as the task running longer than approximate 15 mins, all the executors are lost just like some sort of timeout no matter if the

[no subject]

2014-09-24 Thread Jianshi Huang
One of my big spark program always get stuck at 99% where a few tasks never finishes. I debugged it by printing out thread stacktraces, and found there're workers stuck at parquet.hadoop.ParquetFileReader.readNextRowGroup. Anyone had similar problem? I'm using Spark 1.1.0 built for HDP2.1. The

[no subject]

2014-08-20 Thread Cường Phạm

ERROR UserGroupInformation: Can't find user in Subject:

2014-08-11 Thread Dan Foisy
MB) textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at console:12 scala textFile.count() *14/08/10 08:55:58 ERROR UserGroupInformation: Can't find user in Subject:* *Principal: NTUserPrincipal: danfoisy*

[no subject]

2014-07-07 Thread Juan Rodríguez Hortalá
Hi all, I'm writing a Spark Streaming program that uses reduceByKeyAndWindow(), and when I change the windowsLenght or slidingInterval I get the following exceptions, running in local mode 14/07/06 13:03:46 ERROR actor.OneForOneStrategy: key not found: 1404677026000 ms

[no subject]

2014-07-05 Thread Konstantin Kudryavtsev
I faced in very strange behavior of job that I was run on YARN hadoop cluster. One of stages (map function) was split in 80 tasks, 10 of them successfully finished in ~2 min, but all other jobs are running 40 min and still not finished... I suspect they hung on. Any ideas what's going on and how

[no subject]

2014-07-03 Thread Steven Cox
Folks, I have a program derived from the Kafka streaming wordcount example which works fine standalone. Running on Mesos is not working so well. For starters, I get the error below No FileSystem for scheme: hdfs. I've looked at lots of promising comments on this issue so now I have - *

no subject

2014-05-13 Thread Herman, Matt (CORP)
unsubscribe -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an

  1   2   >