e or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 31 Ma
site.xml for NodeManager configurations:
>
> spark.shuffle.push.server.mergedShuffleFileManagerImpl
> org.apache.spark.network.shuffle.RemoteBlockPushResolver
>
>
> On Tue, May 24, 2022 at 3:30 PM Mridul Muralidharan
> wrote:
>
>> +CC zhouye...@gmail.com
>
Hi,
First of all, I am very thankful for all of the amazing work that goes into
this project! It has opened up so many doors for me! I am a long time Spark
user, and was very excited to start working with the push-based shuffle
service for an academic paper we are working on, but I encountered
Hello community,
Previously in Spark 2.4, we listen and capture ExternalCatalogEvent on
"onOtherEvent()" method of SparkListener, but with Spark 3, we no longer
see those events.
Just wonder if there is any behavior change for emitting
ExternalCatalogEvent in Spark 3, and if yes, where should I
unsubscribe
On Dec. 9, 2019 6:59 a.m., "Areg Baghdasaryan (BLOOMBERG/ 731 LEX)"
wrote:
This e-mail (and any attachments) is intended only for the use of the addressee
and may contain confidential and privileged information. If you are not the
intended recipient, any collection, use,
artitions
>
> This control the number of files generated.
>
> On 28 Nov 2016 8:29 p.m., "Kevin Tran" <kevin...@gmail.com> wrote:
>
>> Hi Denny,
>> Thank you for your inputs. I also use 128 MB but still too many files
>> generated by Spark app which is only ~14
Ha's presentation
> Data
> Storage Tips for Optimal Spark Performance
> <https://spark-summit.org/2015/events/data-storage-tips-for-optimal-spark-performance/>.
>
>
> On Sun, Nov 27, 2016 at 9:44 PM Kevin Tran <kevin...@gmail.com> wrote:
>
>> Hi Everyone,
Hi Everyone,
Does anyone know what is the best practise of writing parquet file from
Spark ?
As Spark app write data to parquet and it shows that under that directory
there are heaps of very small parquet file (such as
e73f47ef-4421-4bcc-a4db-a56b110c3089.parquet). Each parquet file is only
15KB
Hi Everyone,
Does anyone know how could we extract timestamp from Kafka message in Spark
streaming ?
JavaPairInputDStream messagesDStream =
KafkaUtils.createDirectStream(
ssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
Hi Everyone,
I tried in cluster mode on YARN
* spark-submit --jars /path/sqldriver.jar
* --driver-class-path
* spark-env.sh
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/path/*"
* spark-defaults.conf
spark.driver.extraClassPath
spark.executor.extraClassPath
None of them works for me !
Does
] INFO
org.apache.spark.executor.Executor - Finished task 0.0 in stage 12.0 (TID
12). 2518 bytes result sent to driver
Does anyone have any ideas?
On Wed, Sep 7, 2016 at 7:30 PM, Kevin Tran <kevin...@gmail.com> wrote:
> Hi Everyone,
> Does anyone know why call() function bei
Hi Everyone,
Does anyone know why call() function being called *3 times* for each
message arrive
JavaDStream message = messagesDStream.map(new
>> Function, String>() {
>
> @Override
>
> public String call(Tuple2 tuple2) {
>
> return tuple2._2();
>
> }
>
>
rdpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable
Hi everyone,
Please give me your opinions on what is the best ID Generator for ID field
in parquet ?
UUID.randomUUID();
AtomicReference currentTime = new
AtomicReference<>(System.currentTimeMillis());
AtomicLong counter = new AtomicLong(0);
Thanks,
Kevin.
ny and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destru
Hi,
Does anyone know what is the best practises to store data to parquet file?
Does parquet file has limit in size ( 1TB ) ?
Should we use SaveMode.APPEND for long running streaming app ?
How should we store in HDFS (directory structure, ... )?
Thanks,
Kevin.
Hi,
I wrote to parquet file as following:
++
|word|
++
|THIS IS MY CHARACTERS ...|
|// ANOTHER LINE OF CHAC...|
++
These lines are not full text and it is being trimmed down.
Does anyone know how many chacters StringType
Hi Everyone,
Does anyone know how to write parquet file after parsing data in Spark
Streaming?
Thanks,
Kevin.
Hi guys,
I'm hoping that someone can help me to make my setup more efficient. I'm trying
to do record linkage across 2.5 billion records and have set myself up in Spark
to handle the data. Right as of now, I'm relying on R (with the stringdist and
RecordLinkage packages) to do the actual
) and submitting Spark application to Yarn
in cluster mode.
Any help is appreciated.
--
Linh M. Tran
Hello All,
I have several Spark Streaming applications running on Standalone mode in
Spark 1.5. Spark is currently set up for dynamic resource allocation. The
issue I am seeing is that I can have about 12 Spark Streaming Jobs running
concurrently. Occasionally I would see more than half where
Hello all,
I am currently having an error with Spark SQL access Elasticsearch using
Elasticsearch Spark integration. Below is the series of command I issued
along with the stacktrace. I am unclear what the error could mean. I can
print the schema correctly but error out if i try and display a
22 matches
Mail list logo