Re: Unsubscribe

2020-12-22 Thread Wesley Peng

Bhavya Jain wrote:

Unsubscribe


please send an email to: user-unsubscr...@spark.apache.org to 
unsubscribe yourself from the list. thanks.


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Unsubscribe

2020-12-22 Thread Bhavya Jain
Unsubscribe


Re: Printing Logs in map-partition

2020-12-22 Thread lec ssmi
the logs  printed in the map function exist  in the worker node, you can
access it   directly, or you can  browse through webui.

abby37  于2020年12月23日周三 下午1:53写道:

> I want to print some logs in transformation mapPartitions  to logs the
> internal working of function.
> I have used following techniques without any success.
> 1. System.out.println()
> 2. System.err.println()
> 3. Log4j - logger.info
> 4. Log4j - logger.debug
>
> My code for mapPartitions is similar to  this
> <
> https://github.com/broadinstitute/gatk/blob/master/src/main/java/org/broadinstitute/hellbender/tools/spark/bwa/BwaSparkEngine.java#L103>
>
> .
>
> Is there any way to print logs on console in mapPartitions. Thanks for your
> time and helping me in advance.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Printing Logs in map-partition

2020-12-22 Thread abby37
I want to print some logs in transformation mapPartitions  to logs the
internal working of function.
I have used following techniques without any success.
1. System.out.println()
2. System.err.println()
3. Log4j - logger.info
4. Log4j - logger.debug

My code for mapPartitions is similar to  this

 
.

Is there any way to print logs on console in mapPartitions. Thanks for your
time and helping me in advance. 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



DataSourceV2 with ordering pushdown

2020-12-22 Thread Kohki Nishio
I'm trying to connect Spark with Lucene indices and noticed that I can't
really tell what ordering Spark can expect from my Batch / PartitionReader.

Spark ended up retrieving all rows then doing ordering if there's any
orderBy, is there anyway I can tell spark that this partition is ordered ?
Is working with a physical plan the only way to achieve this ?

Thanks
-- 
Kohki Nishio


Re: Integrating multiple streaming sources

2020-12-22 Thread Artemis User
Hmm, looks like Spark 2.3+ does support stream-to-stream join. But the 
online doc doesn't provide any examples.  If anyone could provide some 
concrete reference, I'd really appreciate.  Thanks! -- ND


On 12/22/20 9:57 AM, Artemis User wrote:
Is there anyway to integrate/fuse multiple streaming sources into a 
single stream process?  In other words, the current structured 
streaming API dictates a single a streaming source and sink.  We'd 
like to have a stream process that interfaces with multiple stream 
sources, perform a join and direct the result to a single sink. Is 
this possible?  Thanks!


-- ND


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Integrating multiple streaming sources

2020-12-22 Thread Artemis User
Is there anyway to integrate/fuse multiple streaming sources into a 
single stream process?  In other words, the current structured streaming 
API dictates a single a streaming source and sink.  We'd like to have a 
stream process that interfaces with multiple stream sources, perform a 
join and direct the result to a single sink. Is this possible?  Thanks!


-- ND


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Spark Streaming] support of non-timebased windows and lag function

2020-12-22 Thread Moser, Michael
Hi,

I have a question regarding Spark structured streaming:
will non-timebased window operations like the lag function be supported at some 
point, or is this not on the table due to technical difficulties?

I.e. will something like this be possible in the future:
w = Window.partitionBy('uid').orderBy('timestamp')
df = df.withColumn('lag', lag(df['col_A']).over(w))

This would be useful in streaming applications, where we look for "patterns" 
based on the occurrence of multiple events (rows) in a particular order.
A different way to achieve the above functionality, while being more general, 
would be to use joins, but for this a streaming-ready, monotonically increasing 
and (!) concurrent uid would be needed.

Thanks a lot & best,
Michael