[jira] [Commented] (HIVE-8853) Make vectorization work with Spark [Spark Branch]

2015-01-12 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273792#comment-14273792
 ] 

Jimmy Xiang commented on HIVE-8853:
---

Looked into it and found vectorization uses Map/Reduce work cache a lot. I 
re-opened HIVE-9135.

 Make vectorization work with Spark [Spark Branch]
 -

 Key: HIVE-8853
 URL: https://issues.apache.org/jira/browse/HIVE-8853
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 In Hive to make vectorization work, the reader needs to be also vectorized, 
 which means that the reader can read a chunk of rows (or a list of column 
 chunks) instead of one row at a time. However, we use Spark RDD for reading, 
 which again utilized the underlying inputformat to read. Subsequent 
 processing also needs to hapen in batches. We need to make sure that 
 vectorizatoin is working as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8853) Make vectorization work with Spark [Spark Branch]

2015-01-08 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270418#comment-14270418
 ] 

Brock Noland commented on HIVE-8853:


[~jxiang] I took some thread dumps of an executor JVM during execution with 
vectorization turned on and I saw a ton of thread dumps here like the ones 
below.

{noformat}
Executor task launch worker-4 daemon prio=10 tid=0x7f8394048800 
nid=0x707a runnable [0x7f8457dfb000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked 0x000281c57b70 (a sun.nio.ch.Util$2)
- locked 0x000281c57b80 (a java.util.Collections$UnmodifiableSet)
- locked 0x000281c57b28 (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:186)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
- locked 0x000718424118 (a 
org.apache.hadoop.hdfs.RemoteBlockReader2)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749)
- eliminated 0x0007184169b8 (a 
org.apache.hadoop.hdfs.DFSInputStream)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
- locked 0x0007184169b8 (a org.apache.hadoop.hdfs.DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.hive.com.esotericsoftware.kryo.io.Input.fill(Input.java:146)
at 
org.apache.hive.com.esotericsoftware.kryo.io.Input.require(Input.java:178)
at 
org.apache.hive.com.esotericsoftware.kryo.io.Input.readUtf8_slow(Input.java:542)
at 
org.apache.hive.com.esotericsoftware.kryo.io.Input.readUtf8(Input.java:535)
at 
org.apache.hive.com.esotericsoftware.kryo.io.Input.readString(Input.java:465)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:171)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:160)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at