>
> Jacek
>
> On 29 Dec 2016 5:03 p.m., "Yin Huai" <yh...@databricks.com> wrote:
>
>> Hi all,
>>
>> Apache Spark 2.1.0 is the second release of Spark 2.x line. This release
>> makes significant strides in the production readiness of Structured
Hello,
We spent sometime preparing artifacts and changes to the website (including
the release notes). I just sent out the the announcement. 2.1.0 is
officially released.
Thanks,
Yin
On Wed, Dec 28, 2016 at 12:42 PM, Justin Miller <
justin.mil...@protectwise.com> wrote:
> Interesting, because
Hi all,
Apache Spark 2.1.0 is the second release of Spark 2.x line. This release
makes significant strides in the production readiness of Structured
Streaming, with added support for event time watermarks
Hello Michael,
Thank you for reporting this issue. It will be fixed by
https://github.com/apache/spark/pull/16080.
Thanks,
Yin
On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao wrote:
> Hi!
>
> Do you have real HIVE installation?
> Have you built Spark 2.1 & Spark 2.0 with
Hi, we made the change because the partitioning discovery logic was too
flexible and it introduced problems that were very confusing to users. To
make your case work, we have introduced a new data source option called
basePath. You can use
DataFrame df =
No problem! Glad it helped!
On Thu, Jan 7, 2016 at 12:05 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:
> Hi Yin, thanks much your answer solved my problem. Really appreciate it!
>
> Regards
>
>
> On Fri, Jan 8, 2016 at 1:26 AM, Yin Huai <yh...@databricks.com> wrot
Hi Eran,
Can you try 1.6? With the change in
https://github.com/apache/spark/pull/10288, JSON data source will not throw
a runtime exception if there is any record that it cannot parse. Instead,
it will put the entire record to the column of "_corrupt_record".
Thanks,
Yin
On Sun, Dec 20, 2015
Can you attach the result of eventDF.filter($"entityType" ===
"user").select("entityId").distinct.explain(true)?
Thanks,
Yin
On Thu, Nov 5, 2015 at 1:12 AM, 千成徳 wrote:
> Hi All,
>
> I have data frame like this.
>
> Equality expression is not working in 1.5.1 but, works as
Hi Ross,
What version of spark are you using? There were two issues that affected
the results of window function in Spark 1.5 branch. Both of issues have
been fixed and will be released with Spark 1.5.2 (this release will happen
soon). For more details of these two issues, you can take a look at
@filthysocks, can you get the output of jmap -histo before the OOM (
http://docs.oracle.com/javase/7/docs/technotes/tools/share/jmap.html)?
On Mon, Oct 26, 2015 at 6:35 AM, filthysocks wrote:
> We upgrade from 1.4.1 to 1.5 and it's a pain
> see
>
>
btw, what version of Spark did you use?
On Mon, Oct 19, 2015 at 1:08 PM, YaoPau wrote:
> I've connected Spark SQL to the Hive Metastore and currently I'm running
> SQL
> code via pyspark. Typically everything works fine, but sometimes after a
> long-running Spark SQL job I
Hi Pala,
Can you add the full stacktrace of the exception? For now, can you use
create temporary function to workaround the issue?
Thanks,
Yin
On Wed, Sep 30, 2015 at 11:01 AM, Pala M Muthaia <
mchett...@rocketfuelinc.com.invalid> wrote:
> +user list
>
> On Tue, Sep 29, 2015 at 3:43 PM, Pala
670)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at
> org.apache.spark.repl.SparkILoop$$a
Hi Ritesh,
Right now, we only allow specific data types defined in the inputSchema.
Supporting abstract types (e.g. NumericType) may cause the logic of a UDAF
be more complex. It will be great to understand the use cases first. What
kinds of possible input data types that you want to support and
Seems 1.4 has the same issue.
On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai <yh...@databricks.com> wrote:
> btw, does 1.4 has the same problem?
>
> On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai <yh...@databricks.com> wrote:
>
>> Hi Jerry,
>>
>> Looks like it
/browse/SPARK-10731>).
>>
>> Best Regards,
>>
>> Jerry
>>
>>
>> On Mon, Sep 21, 2015 at 1:01 PM, Yin Huai <yh...@databricks.com> wrote:
>>
>>> btw, does 1.4 has the same problem?
>>>
>>> On Mon, Sep 21, 2015 at 10:01
Hi Jerry,
Looks like it is a Python-specific issue. Can you create a JIRA?
Thanks,
Yin
On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote:
> Hi Spark Developers,
>
> I just ran some very simple operations on a dataset. I was surprise by the
> execution plan of take(1),
btw, does 1.4 has the same problem?
On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai <yh...@databricks.com> wrote:
> Hi Jerry,
>
> Looks like it is a Python-specific issue. Can you create a JIRA?
>
> Thanks,
>
> Yin
>
> On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam <
tting. Is this really the expected behavior? Never seen something
> returning null in other Scala tools that I used.
>
> Regards,
>
>
> 2015-09-14 18:54 GMT-03:00 Yin Huai <yh...@databricks.com>:
>
>> btw, move it to user list.
>>
>> On Mon, Sep 14, 2015 at 2:
btw, move it to user list.
On Mon, Sep 14, 2015 at 2:54 PM, Yin Huai <yh...@databricks.com> wrote:
> A scale of 10 means that there are 10 digits at the right of the decimal
> point. If you also have precision 10, the range of your data will be [0, 1)
> and casting "10.5&qu
Looks like you hit https://issues.apache.org/jira/browse/SPARK-10422, it
has been fixed in branch 1.5. 1.5.1 release will have it.
On Fri, Sep 11, 2015 at 3:35 AM, guoqing0...@yahoo.com.hk <
guoqing0...@yahoo.com.hk> wrote:
> Hi all ,
> After upgrade spark to 1.5 , Streaming throw
>
For now, user-defined window function is not supported. We will add it in
future.
On Mon, Aug 24, 2015 at 6:26 AM, xander92 alexander.fra...@ompnt.com
wrote:
The ultimate aim of my program is to be able to wrap an arbitrary Scala
function (mostly will be statistics / customized rolling window
Can you try to use HiveContext instead of SQLContext? Your query is trying
to create a table and persist the metadata of the table in metastore, which
is only supported by HiveContext.
On Wed, Aug 19, 2015 at 8:44 AM, Yusuf Can Gürkan yu...@useinsider.com
wrote:
Hello,
I’m trying to create a
Hi Todd,
We have not got a chance to update it. We will update it after 1.5 release.
Thanks,
Yin
On Thu, Aug 13, 2015 at 6:49 AM, Todd bit1...@163.com wrote:
Hi,
I got a question about the spark-sql-perf project by Databricks at
https://github.com/databricks/spark-sql-perf/
The
We do this in SparkILookp (
https://github.com/apache/spark/blob/master/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L1023-L1037).
What is the version of Spark you are using? How did you add the spark-csv
jar?
On Thu, Jul 16, 2015 at 1:21 PM, Koert Kuipers
No problem:) Glad to hear that!
On Thu, Jul 16, 2015 at 8:22 PM, Koert Kuipers ko...@tresata.com wrote:
that solved it, thanks!
On Thu, Jul 16, 2015 at 6:22 PM, Koert Kuipers ko...@tresata.com wrote:
thanks i will try 1.4.1
On Thu, Jul 16, 2015 at 5:24 PM, Yin Huai yh...@databricks.com
Your query will be partitioned once. Then, a single Window operator will
evaluate these three functions. As mentioned by Harish, you can take a look
at the plan (sql(your sql...).explain()).
On Mon, Jul 13, 2015 at 12:26 PM, Harish Butani rhbutani.sp...@gmail.com
wrote:
Just once.
You can see
Hi Brandon,
Can you explain what did you mean by It simply does not work? You did not
see new data files?
Thanks,
Yin
On Fri, Jul 10, 2015 at 11:55 AM, Brandon White bwwintheho...@gmail.com
wrote:
Why does this not work? Is insert into broken in 1.3.1? It does not throw
any errors, fail, or
Jerrick,
Let me ask a few clarification questions. What is the version of Spark? Is
the table a hive table? What is the format of the table? Is the table
partitioned?
Thanks,
Yin
On Sun, Jul 12, 2015 at 6:01 PM, ayan guha guha.a...@gmail.com wrote:
Describe computes statistics, so it will
variable, however, as
the behavior of the shell changed from hanging to quitting when the env var
value got to 1g.
/Sim
Simeon Simeonov, Founder CTO, Swoop http://swoop.com/
@simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746
From: Yin Huai yh...@databricks.com
4g
/Sim
Simeon Simeonov, Founder CTO, Swoop http://swoop.com/
@simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746
From: Yin Huai yh...@databricks.com
Date: Monday, July 6, 2015 at 12:59 AM
To: Simeon Simeonov s...@swoop.com
Cc: Denny Lee denny.g@gmail.com
the test file so you can reproduce this in your
own environment.
/Sim
Simeon Simeonov, Founder CTO, Swoop http://swoop.com/
@simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746
From: Yin Huai yh...@databricks.com
Date: Sunday, July 5, 2015 at 11:04 PM
To: Denny Lee denny.g
Sim,
Can you increase the PermGen size? Please let me know what is your setting
when the problem disappears.
Thanks,
Yin
On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote:
I had run into the same problem where everything was working swimmingly
with Spark 1.3.1. When I
HiveContext's
methods). Can you just use the sqlContext created by the shell and try
again?
Thanks,
Yin
On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai yh...@databricks.com wrote:
Hi Sim,
Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3
(explained in https://issues.apache.org/jira/browse
Hi Sim,
Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3
(explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you
add --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m in the
command you used to launch Spark shell? This will increase the PermGen size
The function accepted by explode is f: Row = TraversableOnce[A]. Seems
user_mentions is an array of structs. So, can you change your
pattern matching to the following?
case Row(rows: Seq[_]) = rows.asInstanceOf[Seq[Row]].map(elem = ...)
On Wed, Jun 24, 2015 at 5:27 AM, Gustavo Arjones
Hi James,
Maybe it's the DISTINCT causing the issue.
I rewrote the query as follows. Maybe this one can finish faster.
select
sum(cnt) as uses,
count(id) as users
from (
select
count(*) cnt,
cast(id as string) as id,
from usage_events
where
btw, user listt will be a better place for this thread.
On Thu, Jun 18, 2015 at 8:19 AM, Yin Huai yh...@databricks.com wrote:
Is it the full stack trace?
On Thu, Jun 18, 2015 at 6:39 AM, Sea 261810...@qq.com wrote:
Hi, all:
I want to run spark sql on yarn(yarn-client), but ... I already
Are you writing to an existing hive orc table?
On Wed, Jun 17, 2015 at 3:25 PM, Cheng Lian lian.cs@gmail.com wrote:
Thanks for reporting this. Would you mind to help creating a JIRA for this?
On 6/16/15 2:25 AM, patcharee wrote:
I found if I move the partitioned columns in schemaString
to open a jira about removing this requirement for
inserting into an existing hive table.
Thanks,
Yin
On Thu, Jun 18, 2015 at 9:39 PM, Yin Huai yh...@databricks.com wrote:
Are you writing to an existing hive orc table?
On Wed, Jun 17, 2015 at 3:25 PM, Cheng Lian lian.cs@gmail.com wrote
Hi John,
Did you also set spark.sql.planner.externalSort to true? Probably you will
not see executor lost with this conf. For now, maybe you can manually split
the query to two parts, one for skewed keys and one for other records.
Then, you union then results of these two parts together.
Thanks,
shell session, every time I do a write I
get a random number of tasks failing on the first run with the NPE.
Using dynamic allocation of executors in YARN mode. No speculative
execution is enabled.
On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai yh...@databricks.com wrote:
I saw it once but I
I saw it once but I was not clear how to reproduce it. The jira I created
is https://issues.apache.org/jira/browse/SPARK-7837.
More information will be very helpful. Were those errors from speculative
tasks or regular tasks (the first attempt of the task)? Is this error
deterministic (can you
Hi Chris,
Have you imported org.apache.spark.sql.functions._?
Thanks,
Yin
On Fri, Jun 12, 2015 at 8:05 AM, Chris Freeman cfree...@alteryx.com wrote:
I’m trying to iterate through a list of Columns and create new Columns
based on a condition. However, the when method keeps giving me errors
Hi Doug,
For now, I think you can use sqlContext.sql(USE databaseName) to change
the current database.
Thanks,
Yin
On Thu, Jun 4, 2015 at 12:04 PM, Yin Huai yh...@databricks.com wrote:
Hi Doug,
sqlContext.table does not officially support database name. It only
supports table name
Are you using RC4?
On Wed, Jun 3, 2015 at 10:58 PM, Night Wolf nightwolf...@gmail.com wrote:
Thanks Yin, that seems to work with the Shell. But on a compiled
application with Spark-submit it still fails with the same exception.
On Thu, Jun 4, 2015 at 2:46 PM, Yin Huai yh...@databricks.com
.
Thanks,
Doug
On Jun 3, 2015, at 8:21 PM, Yin Huai yh...@databricks.com wrote:
Hi Doug,
Actually, sqlContext.table does not support database name in both Spark
1.3 and Spark 1.4. We will support it in future version.
Thanks,
Yin
On Wed, Jun 3, 2015 at 10:45 AM, Doug
Hi Doug,
Actually, sqlContext.table does not support database name in both Spark 1.3
and Spark 1.4. We will support it in future version.
Thanks,
Yin
On Wed, Jun 3, 2015 at 10:45 AM, Doug Balog doug.sparku...@dugos.com
wrote:
Hi,
sqlContext.table(“db.tbl”) isn’t working for me, I get a
Does it happen every time you read a parquet source?
On Tue, Jun 2, 2015 at 3:42 AM, Anders Arpteg arp...@spotify.com wrote:
The log is from the log aggregation tool (hortonworks, yarn logs ...),
so both executors and driver. I'll send a private mail to you with the full
logs. Also, tried
Looks like your program somehow picked up a older version of parquet (spark
1.3.1 uses parquet 1.6.0rc3 and seems NO_FILTER field was introduced in
1.6.0rc2). Is it possible that you can check the parquet lib version in
your classpath?
Thanks,
Yin
On Sat, May 30, 2015 at 2:44 PM, ogoh
Hi Cesar,
We just added it in Spark 1.4.
In Spark 1.4, You can use window function in HiveContext to do it. Assuming
you want to calculate the cumulative sum for every flag,
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
df.select(
$flag,
$price,
Can you attach the full stack trace?
Thanks,
Yin
On Fri, May 8, 2015 at 4:44 PM, barmaley o...@solver.com wrote:
Given a registered table from data frame, I'm able to execute queries like
sqlContext.sql(SELECT STDDEV(col1) FROM table) from Spark Shell just
fine.
However, when I run exactly
, 2015 at 11:00 AM, Yin Huai yh...@databricks.com wrote:
The exception looks like the one mentioned in
https://issues.apache.org/jira/browse/SPARK-4520. What is the version of
Spark?
On Fri, Apr 24, 2015 at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
My data looks like
Hi Sergio,
I missed this thread somehow... For the error case classes cannot have
more than 22 parameters., it is the limitation of scala (see
https://issues.scala-lang.org/browse/SI-7296). You can follow the
instruction at
The exception looks like the one mentioned in
https://issues.apache.org/jira/browse/SPARK-4520. What is the version of
Spark?
On Fri, Apr 24, 2015 at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
My data looks like this:
+---++--+
|
Hi Shuai,
You can use as to create a table alias. For example, df1.as(df1). Then
you can use $df1.col to refer it.
Thanks,
Yin
On Thu, Apr 23, 2015 at 11:14 AM, Shuai Zheng szheng.c...@gmail.com wrote:
Hi All,
I use 1.3.1
When I have two DF and join them on a same name key, after
Xudong and Rex,
Can you try 1.3.1? With PR 5339 http://github.com/apache/spark/pull/5339 ,
after we get a hive parquet from metastore and convert it to our native
parquet code path, we will cache the converted relation. For now, the first
access to that hive parquet table reads all of the footers
Hi Cesar,
Can you try 1.3.1 (
https://spark.apache.org/releases/spark-release-1-3-1.html) and see if it
still shows the error?
Thanks,
Yin
On Fri, Apr 17, 2015 at 1:58 PM, Reynold Xin r...@databricks.com wrote:
This is strange. cc the dev list since it might be a bug.
On Thu, Apr 16,
Can your code that can reproduce the problem?
On Thu, Apr 16, 2015 at 5:42 AM, Arush Kharbanda ar...@sigmoidanalytics.com
wrote:
Hi
As per JIRA this issue is resolved, but i am still facing this issue.
SPARK-2734 - DROP TABLE should also uncache table
--
[image: Sigmoid Analytics]
, Yin Huai yh...@databricks.com wrote:
Can your code that can reproduce the problem?
On Thu, Apr 16, 2015 at 5:42 AM, Arush Kharbanda
ar...@sigmoidanalytics.com wrote:
Hi
As per JIRA this issue is resolved, but i am still facing this issue.
SPARK-2734 - DROP TABLE should also uncache table
For Map type column, fields['driver'] is the syntax to retrieve the map
value (in the schema, you can see fields: map). The syntax of
fields.driver is used for struct type.
On Thu, Apr 16, 2015 at 12:37 AM, jc.francisco jc.francisc...@gmail.com
wrote:
Hi,
I'm new with both Cassandra and Spark
Hi Neal,
spark.sql.shuffle.partitions is the property to control the number of tasks
after shuffle (to generate t2, there is a shuffle for the aggregations
specified by groupBy and agg.) You can use
sqlContext.setConf(spark.sql.shuffle.partitions, newNumber) or
sqlContext.sql(set
Hi Justin,
Does the schema of your data have any decimal, array, map, or struct type?
Thanks,
Yin
On Tue, Apr 7, 2015 at 6:31 PM, Justin Yip yipjus...@prediction.io wrote:
Hello,
I have a parquet file of around 55M rows (~ 1G on disk). Performing simple
grouping operation is pretty
For cast, you can use selectExpr method. For example,
df.selectExpr(cast(col1 as int) as col1, cast(col2 as bigint) as col2).
Or, df.select(df(colA).cast(int), ...)
On Thu, Apr 2, 2015 at 8:33 PM, Michael Armbrust mich...@databricks.com
wrote:
val df = Seq((test, 1)).toDF(col1, col2)
You can
You are hitting https://issues.apache.org/jira/browse/SPARK-6330. It has
been fixed in 1.3.1, which will be released soon.
On Fri, Mar 27, 2015 at 10:42 PM, sud_self 852677...@qq.com wrote:
spark version is 1.3.0 with tanhyon-0.6.1
QUESTION DESCRIPTION:
Hello Kevin,
You can take a look at our generic load function
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#generic-loadsave-functions
.
For example, you can use
val df = sqlContext.load(/myData, parquet)
To load a parquet dataset stored in /myData as a DataFrame
---
Ananda Basak
Ph: 425-213-7092
*From:* BASAK, ANANDA
*Sent:* Tuesday, March 17, 2015 3:08 PM
*To:* Yin Huai
*Cc:* user@spark.apache.org
*Subject:* RE: Date and decimal datatype not working
Ok, thanks for the suggestions. Let me try and will confirm all.
Regards
Ananda
Yin,
thanks a lot for that! Will give it a shot and let you know.
On 19 March 2015 at 16:30, Yin Huai yh...@databricks.com wrote:
Was the OOM thrown during the execution of first stage (map) or the
second stage (reduce)? If it was the second stage, can you increase the
value
Can you add your code snippet? Seems it's missing in the original email.
Thanks,
Yin
On Thu, Mar 19, 2015 at 3:22 PM, kamatsuoka ken...@gmail.com wrote:
I'm trying to filter a DataFrame by a date column, with no luck so far.
Here's what I'm doing:
When I run reqs_day.count() I get zero,
created https://issues.apache.org/jira/browse/SPARK-6413 to track the
improvement on the output of DESCRIBE statement.
On Thu, Mar 19, 2015 at 12:11 PM, Yin Huai yh...@databricks.com wrote:
Hi Christian,
Your table is stored correctly in Parquet format.
For saveAsTable, the table created
Hi Christian,
Your table is stored correctly in Parquet format.
For saveAsTable, the table created is *not* a Hive table, but a Spark SQL
data source table (
http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#data-sources).
We are only using Hive's metastore to store the metadata (to
:
Hi Yin,
Thanks for your feedback. I have 1700 parquet files, sized 100MB each. The
number of tasks launched is equal to the number of parquet files. Do you
have any idea on how to deal with this situation?
Thanks a lot
On 18 Mar 2015 17:35, Yin Huai yh...@databricks.com wrote:
Seems
Hi Roberto,
For now, if the timestamp is a top level column (not a field in a
struct), you can use use backticks to quote the column name like `timestamp
`.
Thanks,
Yin
On Wed, Mar 18, 2015 at 12:10 PM, Roberto Coluccio
roberto.coluc...@gmail.com wrote:
Hey Cheng, thank you so much for your
it mean
'resolved attribute'?
On Mar 17, 2015 8:14 PM, Yin Huai yh...@databricks.com wrote:
The number is an id we used internally to identify an resolved Attribute.
Looks like basic_null_diluted_d was not resolved since there is no id
associated with it.
On Tue, Mar 17, 2015 at 2:08 PM
p(0) is a String. So, you need to explicitly convert it to a Long. e.g.
p(0).trim.toLong. You also need to do it for p(2). For those BigDecimals
value, you need to create BigDecimal objects from your String values.
On Tue, Mar 17, 2015 at 5:55 PM, BASAK, ANANDA ab9...@att.com wrote:
Hi All,
.
I will check it soon and update.
Can you elaborate what does it mean the # and the number? Is that a
reference to the field in the rdd?
Thank you,
Ophir
On Mar 17, 2015 7:06 PM, Yin Huai yh...@databricks.com wrote:
Seems basic_null_diluted_d was not resolved? Can you check
Seems basic_null_diluted_d was not resolved? Can you check if
basic_null_diluted_d is in you table?
On Tue, Mar 17, 2015 at 9:34 AM, Ophir Cohen oph...@gmail.com wrote:
Hi Guys,
I'm registering a function using:
sqlc.registerFunction(makeEstEntry,ReutersDataFunctions.makeEstEntry _)
Then I
Seems you want to use array for the field of providers, like
providers:[{id:
...}, {id:...}] instead of providers:{{id: ...}, {id:...}}
On Fri, Mar 13, 2015 at 7:45 PM, kpeng1 kpe...@gmail.com wrote:
Hi All,
I was noodling around with loading in a json file into spark sql's hive
context and
Are you using SQLContext? Right now, the parser in the SQLContext is quite
limited on the data type keywords that it handles (see here
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala#L391)
and unfortunately bigint is not handled
Hi Nitay,
Can you try using backticks to quote the column name? Like
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(
struct`int`:bigint)?
Thanks,
Yin
On Tue, Mar 10, 2015 at 2:43 PM, Michael Armbrust mich...@databricks.com
wrote:
Thanks for reporting. This was a result of a change
Hi Edmon,
No, you do not need to install Hive to use Spark SQL.
Thanks,
Yin
On Fri, Mar 6, 2015 at 6:31 AM, Edmon Begoli ebeg...@gmail.com wrote:
Does Spark-SQL require installation of Hive for it to run correctly or
not?
I could not tell from this statement:
Regarding current_date, I think it is not in either Hive 0.12.0 or 0.13.1
(versions that we support). Seems
https://issues.apache.org/jira/browse/HIVE-5472 added it Hive recently.
On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao hao.ch...@intel.com wrote:
The temp table in metastore can not be
@Shahab, based on https://issues.apache.org/jira/browse/HIVE-5472,
current_date was added in Hive *1.2.0 (not 0.12.0)*. For my previous email,
I meant current_date is not in neither Hive 0.12.0 nor Hive 0.13.1 (Spark
SQL currently supports these two Hive versions).
On Tue, Mar 3, 2015 at 8:55 AM,
Is the string of the above JSON object in the same line? jsonFile requires
that every line is a JSON object or an array of JSON objects.
On Mon, Mar 2, 2015 at 11:28 AM, kpeng1 kpe...@gmail.com wrote:
Hi All,
I am currently having issues reading in a json file using spark sql's api.
Here is
Seems you hit https://issues.apache.org/jira/browse/SPARK-4296. It has been
fixed in 1.2.1 and 1.3.
On Thu, Feb 26, 2015 at 1:22 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Can someone confirm if they can run UDFs in group by in spark1.2?
I have two builds running -- one from a custom
Hi Justin,
It is expected. We do not check if the provided schema matches rows since
all rows need to be scanned to give a correct answer.
Thanks,
Yin
On Fri, Feb 13, 2015 at 1:33 PM, Justin Pihony justin.pih...@gmail.com
wrote:
Per the documentation:
It is important to make sure that
Regarding backticks: Right. You need backticks to quote the column name
timestamp because timestamp is a reserved keyword in our parser.
On Tue, Feb 10, 2015 at 3:02 PM, Mohnish Kodnani mohnish.kodn...@gmail.com
wrote:
actually i tried in spark shell , got same error and then for some reason
i
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory
was introduced in Hive 0.14 and Spark SQL only supports Hive 0.12 and
0.13.1. Can you change the setting of hive.security.authorization.manager
to someone accepted by 0.12 or 0.13.1?
On Thu, Feb 5, 2015
Can you try using backticks to quote the field name? Like `f:price`.
On Tue, Feb 10, 2015 at 5:47 AM, presence2001 neil.andra...@thefilter.com
wrote:
Hi list,
I have some data with a field name of f:price (it's actually part of a JSON
structure loaded from ElasticSearch via
Hello Michael,
In Spark SQL, we have our internal concepts of Output Partitioning
(representing the partitioning scheme of an operator's output) and Required
Child Distribution (representing the requirement of input data distribution
of an operator) for a physical operator. Let's say we have two
Yeah, it's a bug. It has been fixed by
https://issues.apache.org/jira/browse/SPARK-3891 in master.
On Tue, Jan 13, 2015 at 2:41 PM, Ted Yu yuzhih...@gmail.com wrote:
Looking at the source code for AbstractGenericUDAFResolver, the following
(non-deprecated) method should be called:
public
Hello Jianshi,
You meant you want to convert a Map to a Struct, right? We can extract some
useful functions from JsonRDD.scala, so others can access them.
Thanks,
Yin
On Mon, Dec 8, 2014 at 1:29 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
I checked the source code for inferSchema. Looks
Hello Jonathan,
There was a bug regarding casting data types before inserting into a Hive
table. Hive does not have the notion of containsNull for array values.
So, for a Hive table, the containsNull will be always true for an array and
we should ignore this field for Hive. This issue has been
For hive on spark, did you mean the thrift server of Spark SQL or
https://issues.apache.org/jira/browse/HIVE-7292? If you meant the latter
one, I think Hive's mailing list will be a good place to ask (see
https://hive.apache.org/mailing_lists.html).
Thanks,
Yin
On Wed, Nov 26, 2014 at 10:49 PM,
I guess you want to use split(\\|) instead of split(|).
On Tue, Nov 25, 2014 at 4:51 AM, Cheng Lian lian.cs@gmail.com wrote:
Which version are you using? Or if you are using the most recent master or
branch-1.2, which commit are you using?
On 11/25/14 4:08 PM, david wrote:
Hi,
I
Hello Jianshi,
The reason of that error is that we do not have a Spark SQL data type for
Scala BigInt. You can use Decimal for your case.
Thanks,
Yin
On Fri, Nov 21, 2014 at 5:11 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I got an error during rdd.registerTempTable(...) saying
Oh, actually, we do not support MapType provided by the schema given to
jsonRDD at the moment (my bad..). Daniel, you need to wait for the patch of
4476 (I should have one soon).
Thanks,
Yin
On Wed, Nov 19, 2014 at 2:32 PM, Daniel Haviv danielru...@gmail.com wrote:
Thank you Michael
I will
Hello Brian,
Right now, MapType is not supported in the StructType provided to
jsonRDD/jsonFile. We will add the support. I have created
https://issues.apache.org/jira/browse/SPARK-4302 to track this issue.
Thanks,
Yin
On Fri, Nov 7, 2014 at 3:41 PM, boclair bocl...@gmail.com wrote:
I'm
Hello Kevin,
https://issues.apache.org/jira/browse/SPARK-3891 will fix this bug.
Thanks,
Yin
On Wed, Nov 5, 2014 at 8:06 PM, Cheng, Hao hao.ch...@intel.com wrote:
Which version are you using? I can reproduce that in the latest code, but
with different exception.
I've filed an bug
Hello Tridib,
For you case, you can use StructType(StructField(ParentInfo, parentInfo,
true) :: StructField(ChildInfo, childInfo, true) :: Nil) to create the
StructType representing the schema (parentInfo and childInfo are two
existing StructTypes). You can take a look at our docs (
1 - 100 of 160 matches
Mail list logo