Spark usage help

2021-09-01 Thread yinghua...@163.com
Hi:
I found that the following methods are used when setting parameters to 
create a sparksession access hive table
1) hive.execution.engine:spark
spark = SparkSession.builder()
  .appName("get data from hive")
  .config("hive.execution.engine", "spark")
  .enableHiveSupport()
  .getOrCreate()
2) spark.sql.warehouse.dir:warehouseLocation
spark = SparkSession.builder()
  .appName("get data from hive")
  .config("spark.sql.warehouse.dir", warehouseLocation)
  .enableHiveSupport()
  .getOrCreate()

What is the difference between the above two parameter settings?


[Spark JDBC] Spark jdbc on SQL server failed with filter and disable pushDownPredicate not work

2021-09-01 Thread Xiaojin Wang
Hi guys,

I recently met with an error with 'JDBC_PUSHDOWN_PREDICATE' option not work.  
The background is

val url = "jdbc:sqlserver://XX"
val properties = new Properties
val df = spark.read.jdbc(url, "movies", properties)
df.filter("rated == true").show()

I am using this code to read from SQL server do transformations with filter. 
However this way I met with an expcetion:
Job aborted due to stage failure. Caused by: SQLServerException: Invalid column 
name 'true'.

The original table contains a 'bit' data type 'rated'. Digging into the code, I 
found 'bit' will be translate to Boolean type. Following the pushdown logic, in 
MSSqlserverDialect compileValue() method, the Boolean value is translated to 
'true'/'false' which doesn't match TSQL language '1'/'0'. And finally caused 
this issue.

After figuring out the issue, I tried to use 'pushDownPredicate' options to 
avoid pushing down the filter logic into SQL query, the code is like

val url = "jdbc:sqlserver://XX"
val properties = new Properties
properties.setProperty(JDBCOptions.JDBC_PUSHDOWN_PREDICATE, "false") add but 
still not work
val df = spark.read.jdbc(url, "movies", properties)
df.filter("rated == true").show()

However it still failed with the same error message. Seems the pushdown false 
is not working at all. So the question is why the pushdownPredicate option is 
not work as expected and if there is other mitigations to fix this issue.


Best,
Xiaojin



question regarding spark streaming continuous processing

2021-09-01 Thread Antonio Si
Hi,

Hi all, I have a couple questions regarding continuous processing:

1.  What is the plan for continuous processing moving forward? Will this 
eventually be released as a production feature as it seems it is still 
experimental? 
2.  In microbatch streaming, there is a StreamingQueryListener and we will be 
able to obtain the kafka offset of the last processed microbatch. Do we have 
anything similar for continuous processing?

Any information would be helpful.

Thanks and regards,

Antonio.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Who can reply me about SPARK-36584

2021-09-01 Thread Zhenyu Hu
Hi All
I recently raised an issue to the community about Spark's dynamic
scaling. When there is a broadcast and dynamic scaling is enabled at the
same time, the driver will send a BlockUpdate message when the broadcast is
completed. ExecutorMonitor#OnBlockUpdate will receive this event and add
the driver as an executor to its data structure. I think this is a bug
because the driver should not be treated as an executor. Can anyone answer
me and confirm this question?

The JIRA: https://issues.apache.org/jira/browse/SPARK-36584


Re: Connection reset by peer : failed to remove cache rdd

2021-09-01 Thread Harsh Sharma



On 2021/08/30 13:32:19, Jacek Laskowski  wrote: 
> Hi,
> 
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
> 
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
> 
> 
> 
> 
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma 
> wrote:
> 
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings  : failed to remove
> > cache rdd or failed  to remove broadcast variable.
> >
> > Please help us how to mitigate this  :
> >
> > Executor memory : 12g
> >
> > Network timeout :   60
> >
> > Heartbeat interval : 25
> >
> >
> >
> > [Stage 284:>(199 + 1) / 200][Stage 292:>  (1 + 3)
> > / 200]
> > [Stage 284:>(199 + 1) / 200][Stage 292:>  (2 + 3)
> > / 200]
> > [Stage 292:>  (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > 
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from  is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,012 WARN
> > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelIm

Re: Connection reset by peer : failed to remove cache rdd

2021-09-01 Thread Harsh Sharma
Please Find reply : 
Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? 

ans :its Spark SQL

Do you use broadcast variables ?

ans : yes we are using broadcast variables
 or are the errors
 coming from broadcast joins perhaps? 
ans :we are not using Boardcast join

On 2021/08/30 13:32:19, Jacek Laskowski  wrote: 
> Hi,
> 
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
> 
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
> 
> 
> 
> 
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma 
> wrote:
> 
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings  : failed to remove
> > cache rdd or failed  to remove broadcast variable.
> >
> > Please help us how to mitigate this  :
> >
> > Executor memory : 12g
> >
> > Network timeout :   60
> >
> > Heartbeat interval : 25
> >
> >
> >
> > [Stage 284:>(199 + 1) / 200][Stage 292:>  (1 + 3)
> > / 200]
> > [Stage 284:>(199 + 1) / 200][Stage 292:>  (2 + 3)
> > / 200]
> > [Stage 292:>  (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > 
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from  is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,012 WARN
> > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > java.io.IOException: Connection reset by pee

Spark Phoenix Connection Exception while loading from Phoenix tables

2021-09-01 Thread Harsh Sharma
[01/09/21 11:55:51,861 WARN  pool-1-thread-1](Client) Exception encountered 
while connecting to the server : java.lang.NullPointerException 
[01/09/21 11:55:51,862 WARN  pool-1-thread-1](Client) Exception encountered 
while connecting to the server : java.lang.NullPointerException 

[01/09/21 11:55:51,862 WARN  pool-1-thread-1](RetryInvocationHandler) Exception 
while invoking class 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo
 over server 
Not retrying because failovers (15) exceeded maximum allowed (15) 
java.io.IOException: Failed on local exception: java.io.IOException: 
java.lang.NullPointerException; Host Details : local host is:xx1 
destination host is: xx2
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Unsubscribe

2021-09-01 Thread 孙乾(亨贞)
Unsubscribe