[jira] [Commented] (HIVE-8853) Make vectorization work with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273792#comment-14273792 ] Jimmy Xiang commented on HIVE-8853: --- Looked into it and found vectorization uses Map/Reduce work cache a lot. I re-opened HIVE-9135. Make vectorization work with Spark [Spark Branch] - Key: HIVE-8853 URL: https://issues.apache.org/jira/browse/HIVE-8853 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang In Hive to make vectorization work, the reader needs to be also vectorized, which means that the reader can read a chunk of rows (or a list of column chunks) instead of one row at a time. However, we use Spark RDD for reading, which again utilized the underlying inputformat to read. Subsequent processing also needs to hapen in batches. We need to make sure that vectorizatoin is working as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8853) Make vectorization work with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270418#comment-14270418 ] Brock Noland commented on HIVE-8853: [~jxiang] I took some thread dumps of an executor JVM during execution with vectorization turned on and I saw a ton of thread dumps here like the ones below. {noformat} Executor task launch worker-4 daemon prio=10 tid=0x7f8394048800 nid=0x707a runnable [0x7f8457dfb000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked 0x000281c57b70 (a sun.nio.ch.Util$2) - locked 0x000281c57b80 (a java.util.Collections$UnmodifiableSet) - locked 0x000281c57b28 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:186) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146) - locked 0x000718424118 (a org.apache.hadoop.hdfs.RemoteBlockReader2) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749) - eliminated 0x0007184169b8 (a org.apache.hadoop.hdfs.DFSInputStream) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847) - locked 0x0007184169b8 (a org.apache.hadoop.hdfs.DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hive.com.esotericsoftware.kryo.io.Input.fill(Input.java:146) at org.apache.hive.com.esotericsoftware.kryo.io.Input.require(Input.java:178) at org.apache.hive.com.esotericsoftware.kryo.io.Input.readUtf8_slow(Input.java:542) at org.apache.hive.com.esotericsoftware.kryo.io.Input.readUtf8(Input.java:535) at org.apache.hive.com.esotericsoftware.kryo.io.Input.readString(Input.java:465) at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:171) at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:160) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at