[jira] Lantao Jin shared "SPARK-20680: Spark-sql do not support for void column datatype of view" with you

2017-05-09 Thread Lantao Jin (JIRA)
Lantao Jin shared an issue with you




> Spark-sql do not support for void column datatype of view
> -
>
> Key: SPARK-20680
> URL: https://issues.apache.org/jira/browse/SPARK-20680
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Lantao Jin
>
> Create a HIVE view:
> {quote}
> hive> create table bad as select 1 x, null z from dual;
> {quote}
> Because there's no type, Hive gives it the VOID type:
> {quote}
> hive> describe bad;
> OK
> x int 
> z void
> {quote}
> In Spark2.0.x, the behaviour to read this view is normal:
> {quote}
> spark-sql> describe bad;
> x   int NULL
> z   voidNULL
> Time taken: 4.431 seconds, Fetched 2 row(s)
> {quote}
> But in Spark2.1.x, it failed with SparkException: Cannot recognize hive type 
> string: void
> {quote}
> spark-sql> describe bad;
> 17/05/09 03:12:08 INFO execution.SparkSqlParser: Parsing command: describe bad
> 17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: int
> 17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: void
> 17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad]
> org.apache.spark.SparkException: Cannot recognize hive type string: void
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
>   
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
>   
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361)
> Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
> DataType void() is not supported.(line 1, pos 0)
> == SQL ==  
> void   
> ^^^
> ... 61 more
> org.apache.spark.SparkException: Cannot recognize hive type string: void
> {quote}

 Also shared with
  u...@spark.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Faster Spark on ORC with Apache ORC

2017-05-09 Thread Dong Joon Hyun
Hi, All.

Apache Spark always has been a fast and general engine, and
since SPARK-2883, Spark supports Apache ORC inside `sql/hive` module with Hive 
dependency.

With Apache ORC 1.4.0 (released yesterday), we can make Spark on ORC faster and 
get some benefits.

- Speed: Use both Spark `ColumnarBatch` and ORC `RowBatch` together which 
means full vectorization support.

- Stability: Apache ORC 1.4.0 already has many fixes and we can depend on 
ORC community effort in the future.

- Usability: Users can use `ORC` data sources without hive module (-Phive)

- Maintainability: Reduce the Hive dependency and eventually remove some 
old legacy code from `sql/hive` module.

As a first step, I made a PR adding a new ORC data source into `sql/core` 
module.

https://github.com/apache/spark/pull/17924  (+ 3,691 lines, -0)

Could you give some opinions on this approach?

Bests,
Dongjoon.


Re: Any plans for making StateStore pluggable?

2017-05-09 Thread Tathagata Das
Thank you for creating the JIRA. I am working towards making it
configurable very soon.

On Tue, May 9, 2017 at 4:12 PM, Yogesh Mahajan 
wrote:

> Hi Team,
>
> Any plans to make the StateStoreProvider/StateStore in structured
> streaming pluggable ?
> Currently StateStore#loadedProviders has only one HDFSBackedStateStoreProvider
> and it's not configurable.
> If we make this configurable, users can bring in their own implementation
> of StateStore.
>
> Please refer to this ticket - https://issues.apache.org/
> jira/browse/SPARK-20376
>
> Thanks,
> http://www.snappydata.io/blog 
>


Any plans for making StateStore pluggable?

2017-05-09 Thread Yogesh Mahajan
Hi Team,

Any plans to make the StateStoreProvider/StateStore in structured streaming
pluggable ?
Currently StateStore#loadedProviders has only
one HDFSBackedStateStoreProvider and it's not configurable.
If we make this configurable, users can bring in their own implementation
of StateStore.

Please refer to this ticket -
https://issues.apache.org/jira/browse/SPARK-20376

Thanks,
http://www.snappydata.io/blog 


Submitting SparkR to CRAN

2017-05-09 Thread Shivaram Venkataraman
Closely related to the PyPi upload thread (https://s.apache.org/WLtM), I
just wanted to give a heads up that we are working on submitting SparkR
from Spark 2.1.1 as a package to CRAN. The package submission is under
review with CRAN right now and I will post updates to this thread.

The main ticket tracking this effort SPARK-15799 and I'll also create a new
PR on the website on how to update the package with a new release.

Many thanks to everybody who helped with this effort !

Thanks
Shivaram


Parquet vectorized reader DELTA_BYTE_ARRAY

2017-05-09 Thread andreiL
Hi, I am getting an exception in Spark 2.1 reading parquet files where some
columns are DELTA_BYTE_ARRAY encoded.

java.lang.UnsupportedOperationException: Unsupported encoding:
DELTA_BYTE_ARRAY

Is this exception by design, or am I missing something?

If I turn off the vectorized reader, reading these files works fine.

AndreiL



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-vectorized-reader-DELTA-BYTE-ARRAY-tp21538.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-05-09 Thread Kazuaki Ishizaki
+1 (non-binding)

I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for 
core have passed.

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 
1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
$ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 
package install
$ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
...
Run completed in 15 minutes, 12 seconds.
Total number of tests run: 1940
Suites: completed 206, aborted 0
Tests: succeeded 1940, failed 0, canceled 4, ignored 8, pending 0
All tests passed.
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 16:51 min
[INFO] Finished at: 2017-05-09T17:51:04+09:00
[INFO] Final Memory: 53M/514M
[INFO] 

[WARNING] The requested profile "hive" could not be activated because it 
does not exist.


Kazuaki Ishizaki,



From:   Michael Armbrust 
To: "dev@spark.apache.org" 
Date:   2017/05/05 02:08
Subject:[VOTE] Apache Spark 2.2.0 (RC2)



Please vote on releasing the following candidate as Apache Spark version 
2.2.0. The vote is open until Tues, May 9th, 2017 at 12:00 PST and passes 
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc2 (
1d4017b44d5e6ad156abeaae6371747f111dd1f9)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc2-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1236/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc2-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then 
reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked 
on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release 
unless the bug in question is a regression from 2.1.1.