date:20200422

unsubscribe

2020-04-22 Thread akram azarm

-- *M Akram Azarm* *B Eng. in Software Engineering (Reading)* *UOW,UK / IIT,LK* Contact | 077-502-0402

Re: Error while reading hive tables with tmp/hidden files inside partitions

2020-04-22 Thread Wenchen Fan

This looks like a bug that path filter doesn't work for hive table reading. Can you open a JIRA ticket? On Thu, Apr 23, 2020 at 3:15 AM Dhrubajyoti Hati wrote: > Just wondering if any one could help me out on this. > > Thank you! > > > > > *Regards,Dhrubajyoti Hati.* > > > On Wed, Apr 22, 2020

Re: is RosckDB backend available in 3.0 preview?

2020-04-22 Thread Jungtaek Lim

Sorry I should have been more clear. The discussion went to the conclusion that RocksDB state store cannot be included in Spark main codebase - it should start as individual project, and can be adopted when the project is popular enough. (See PR for more details.) That's why I guided to the

Re: pyspark working with a different Python version than the cluster

2020-04-22 Thread Tang Jinxin

Hi Copon, Python In worker use python3 to termine, It may return python3.4 In some nodes, Could you check python3 results? Best wishes, Jinxin xiaoxingstack 邮箱：xiaoxingst...@gmail.com 签名由网易邮箱大师定制 On 04/23/2020 01:02, Odon Copon wrote: Hi, Something is happening to me that I don't quite

回复：[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

2020-04-22 Thread Tang Jinxin

Hi maqy, The exception is occurd by connection closed,one of reasons is datanode side timeout if We have not find problem In spark before the exception.So We could try to find more clues In datanode log. Best wishes, Jinxin xiaoxingstack 邮箱：xiaoxingst...@gmail.com 签名由网易邮箱大师定制

回复：Can I collect Dataset[Row] to driver without converting it toArray [Row]?

2020-04-22 Thread Tang Jinxin

Hi maqy, Thanks for your question.Through consideration,I have some ideas as follow:firstly,try not collect to driver if not nessessary,instead (use foreachpartition)send data from ececutors;secondly,if not use some high performance ser/deser like kryo, we could have a try.As a summary,I

Re: Error while reading hive tables with tmp/hidden files inside partitions

2020-04-22 Thread Dhrubajyoti Hati

Just wondering if any one could help me out on this. Thank you! *Regards,Dhrubajyoti Hati.* On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati wrote: > Hi, > > Is there any way to discard files starting with dot(.) or ending with .tmp > in the hive partition while reading from Hive table

pyspark working with a different Python version than the cluster

2020-04-22 Thread Odon Copon

Hi, Something is happening to me that I don't quite understand. I ran pyspark on a machine that has Python 3.5 where I managed to run some commands, even the Spark cluster is using Python 3.4. If I do the same with spark-submit I get the "Python in worker has different version 3.4 than that in

Spark Adaptive configuration

2020-04-22 Thread Tzahi File

Hi, I saw that spark has an option to adapt the join and shuffle configuration. For example: "spark.sql.adaptive.shuffle.targetPostShuffleInputSize" I wanted to know if you had an experience with such configuration, how it changed the performance? Another question is whether along Spark SQL

Re: is RosckDB backend available in 3.0 preview?

2020-04-22 Thread kant kodali

is it going to make it in 3.0? On Tue, Apr 21, 2020 at 9:24 PM Jungtaek Lim wrote: > Unfortunately, the short answer is no. Please refer the last part of > discussion on the PR https://github.com/apache/spark/pull/24922 > > Unless we get any native implementation of this, I guess this project

回复: 回复：[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

2020-04-22 Thread maqy

Hi Jinxin, 　spark web ui shows that all tasks are completed successfully, this error appears in the shell: java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:244) at

回复: Can I collect Dataset[Row] to driver without converting it toArray [Row]?

2020-04-22 Thread maqy

　Hi Andrew, Thank you for your reply, I am using the scala api of spark, and the tensorflow machine is not in the spark cluster. Is this JIRA / PR still valid in this situation? 　In addition, the current bottleneck of the application is that the amount of data transferred through the

回复：[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

2020-04-22 Thread Tang Jinxin

Maybe datanode stop data transfer due to timeout.Could you please provide exception stack? xiaoxingstack 邮箱：xiaoxingst...@gmail.com 签名由网易邮箱大师定制在2020年04月22日 19:53，maqy 写道： Today I meet the same problem using rdd.collect (), the format of rdd is Tuple2 [Int, Int]. And this problem will

Error while reading hive tables with tmp/hidden files inside partitions

2020-04-22 Thread Dhrubajyoti Hati

Hi, Is there any way to discard files starting with dot(.) or ending with .tmp in the hive partition while reading from Hive table using spark.read.table method. I tried using PathFilters but they didn't work. I am using spark-submit and passing my python file(pyspark) containing the source

Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread Tang Jinxin

maybe could try someway like foreachpartition in foreachrdd,which will not together to driver take too extra consumption. xiaoxingstack 邮箱：xiaoxingst...@gmail.com 签名由网易邮箱大师定制 On 04/22/2020 21:02, Andrew Melo wrote: Hi Maqy On Wed, Apr 22, 2020 at 3:24 AM maqy <454618...@qq.com> wrote: > > I

Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread Andrew Melo

Hi Maqy On Wed, Apr 22, 2020 at 3:24 AM maqy <454618...@qq.com> wrote: > > I will traverse this Dataset to convert it to Arrow and send it to Tensorflow > through Socket. (I presume you're using the python tensorflow API, if you're not, please ignore) There is a JIRA/PR ([1] [2]) which

回复: [Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

2020-04-22 Thread maqy

Today I meet the same problem using rdd.collect (), the format of rdd is Tuple2 [Int, Int]. And this problem will appear when the amount of data reaches about 100GB. I guess there may be something wrong with deserialization. Has anyone else encountered this problem? Best regards, maqy

Re: What is the best way to take the top N entries from a hive table/data source?

2020-04-22 Thread ZHANG Wei

The performance issue might be caused by the parquet table partitions count, only 3. The reader used that partitions count to parallelize extraction. Refer to the log you provided: > spark.sql("select * from db.table limit 100").explain(false) > == Physical Plan == > CollectLimit 100 >

[Structured Streaming] Connecting to Kafka via a Custom Consumer / Producer

2020-04-22 Thread Patrick McGloin

Hi, The large international bank I work for has a custom Kafka implementation. The client libraries that are used to connect to Kafka have extra security steps. They implement the Kafka Consumer and Producer interfaces in this client library so once we use it to connect to Kafka, we can treat

回复: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread maqy

I will traverse this Dataset to convert it to Arrow and send it to Tensorflow through Socket. I tried to use toLocalIterator() to traverse the dataset instead of collect to the driver, but toLocalIterator() will create a lot of jobs and will bring a lot of time consumption. Best regards, maqy

Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread Michael Artz

What would you do with it once you get it into driver in a Dataset[Row]? Sent from my iPhone > On Apr 22, 2020, at 3:06 AM, maqy <454618...@qq.com> wrote: > > > When the data is stored in the Dataset [Row] format, the memory usage is very > small. > When I use collect () to collect data to

Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread maqy

　When the data is stored in the Dataset [Row] format, the memory usage is very small. 　When I use collect () to collect data to the driver, each line of the dataset will be converted to Row and stored in an array, which will bring great memory overhead. 　So, can I collect Dataset[Row] to

Re: Using startingOffsets latest - no data from structured streaming kafka query

2020-04-22 Thread Ruijing Li

For some reason, after restarting the app and trying again, latest now works as expected. Not sure why it didn’t work before. On Tue, Apr 21, 2020 at 1:46 PM Ruijing Li wrote: > Yes, we did. But for some reason latest does not show them. The count is > always 0. > > On Sun, Apr 19, 2020 at 3:42

Re: Deadlock using Barrier Execution

2020-04-22 Thread wuyi

Hi csmith, Just be too late here. We just realize this bug recently. Here's the fix https://github.com/apache/spark/pull/28257. And I believe we're going to backport it into 2.4.x. Best, Yi Wu -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

unsubscribe

Re: Error while reading hive tables with tmp/hidden files inside partitions

Re: is RosckDB backend available in 3.0 preview?

Re: pyspark working with a different Python version than the cluster

回复：[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

回复：Can I collect Dataset[Row] to driver without converting it toArray [Row]?

Re: Error while reading hive tables with tmp/hidden files inside partitions

pyspark working with a different Python version than the cluster

Spark Adaptive configuration

Re: is RosckDB backend available in 3.0 preview?

回复: 回复：[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

回复: Can I collect Dataset[Row] to driver without converting it toArray [Row]?

回复：[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

Error while reading hive tables with tmp/hidden files inside partitions

Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

回复: [Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

Re: What is the best way to take the top N entries from a hive table/data source?

[Structured Streaming] Connecting to Kafka via a Custom Consumer / Producer

回复: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

Can I collect Dataset[Row] to driver without converting it to Array [Row]?

Re: Using startingOffsets latest - no data from structured streaming kafka query

Re: Deadlock using Barrier Execution

24 matches

Site Navigation

Mail list logo

Footer information