Default size of a datatype in SparkSQL

2015-10-07 Thread vivek bhaskar
I want to understand whats use of default size for a given datatype?

Following link mention that its for internal size estimation.
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DataType.html

Above behavior is also reflected in code where default value seems to be
used for stats purpose only.

But then we have default size of String datatype as 4096; why we went for
this random number? Or will it also restrict size of data? Any further
elaboration on how string datatype works will also help.

Regards,
Vivek


Re: Is there any Spark SQL reference manual?

2015-09-14 Thread vivek bhaskar
Thanks Richard, Ted. Hope we have some reference available soon.

Peymen, I had a look at this link before at this but was looking for
something with broader coverage.

PS: Richard, Kindly advise me for generating  BNF description of the
grammar via derby build script. Since this may not be of spark-user concern
you may please reply me privately.

Regards,
Vivek

On Fri, Sep 11, 2015 at 9:24 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Very nice suggestion, Richard.
>
> I logged SPARK-10561 referencing this discussion.
>
> On Fri, Sep 11, 2015 at 8:15 AM, Richard Hillegas <rhil...@us.ibm.com>
> wrote:
>
>> The latest Derby SQL Reference manual (version 10.11) can be found here:
>> https://db.apache.org/derby/docs/10.11/ref/index.html. It is, indeed,
>> very useful to have a comprehensive reference guide. The Derby build
>> scripts can also produce a BNF description of the grammar--but that is not
>> part of the public documentation for the project. The BNF is trivial to
>> generate because it is an artifact of the JavaCC grammar generator which
>> Derby uses.
>>
>> I appreciate the difficulty of maintaining a formal reference guide for a
>> rapidly evolving SQL dialect like Spark's.
>>
>> A machine-generated BNF, however, is easy to imagine. But perhaps not so
>> easy to implement. Spark's SQL grammar is implemented in Scala, extending
>> the DSL support provided by the Scala language. I am new to programming in
>> Scala, so I don't know whether the Scala ecosystem provides any good tools
>> for reverse-engineering a BNF from a class which extends
>> scala.util.parsing.combinator.syntactical.StandardTokenParsers.
>>
>> Thanks,
>> -Rick
>>
>> vivekw...@gmail.com wrote on 09/11/2015 05:05:47 AM:
>>
>> > From: vivek bhaskar <vivekw...@gmail.com>
>> > To: Ted Yu <yuzhih...@gmail.com>
>> > Cc: user <user@spark.apache.org>
>> > Date: 09/11/2015 05:06 AM
>> > Subject: Re: Is there any Spark SQL reference manual?
>> > Sent by: vivekw...@gmail.com
>>
>> >
>> > Hi Ted,
>> >
>> > The link you mention do not have complete list of supported syntax.
>> > For example, few supported syntax are listed as "Supported Hive
>> > features" but that do not claim to be exhaustive (even if it is so,
>> > one has to filter out a lot many lines from Hive QL reference and
>> > still will not be sure if its all - due to versions mismatch).
>> >
>> > Quickly searching online gives me link for another popular open
>> > source project which has good sql reference: https://db.apache.org/
>> > derby/docs/10.1/ref/crefsqlj23296.html.
>> >
>> > I had similar expectation when I was looking for all supported DDL
>> > and DML syntax along with their extensions. For example,
>> > a. Select expression along with supported extensions i.e. where
>> > clause, group by, different supported joins etc.
>> > b. SQL format for Create, Insert, Alter table etc.
>> > c. SQL for Insert, Update, Delete, etc along with their extensions.
>> > d. Syntax for view creation, if supported
>> > e. Syntax for explain mechanism
>> > f. List of supported functions, operators, etc. I can see that 100s
>> > of function are added in 1.5 but then you have to make lot of cross
>> > check from code to JIRA tickets.
>> >
>> > So I wanted a piece of documentation that can provide all such
>> > information at a single place.
>> >
>> > Regards,
>> > Vivek
>> >
>> > On Fri, Sep 11, 2015 at 4:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>> > You may have seen this:
>> > https://spark.apache.org/docs/latest/sql-programming-guide.html
>> >
>> > Please suggest what should be added.
>> >
>> > Cheers
>> >
>> > On Fri, Sep 11, 2015 at 3:43 AM, vivek bhaskar <vivekw...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I am looking for a reference manual for Spark SQL some thing like
>> > many database vendors have. I could find one for hive ql https://
>> > cwiki.apache.org/confluence/display/Hive/LanguageManual but not
>> > anything specific to spark sql.
>> >
>> > Please suggest. SQL reference specific to latest release will be of
>> > great help.
>> >
>> > Regards,
>> > Vivek
>>
>>
>


Re: Is there any Spark SQL reference manual?

2015-09-11 Thread vivek bhaskar
Hi Ted,

The link you mention do not have complete list of supported syntax. For
example, few supported syntax are listed as "Supported Hive features" but
that do not claim to be exhaustive (even if it is so, one has to filter out
a lot many lines from Hive QL reference and still will not be sure if its
all - due to versions mismatch).

Quickly searching online gives me link for another popular open source
project which has good sql reference:
https://db.apache.org/derby/docs/10.1/ref/crefsqlj23296.html.

I had similar expectation when I was looking for all supported DDL and DML
syntax along with their extensions. For example,
a. Select expression along with supported extensions i.e. where clause,
group by, different supported joins etc.
b. SQL format for Create, Insert, Alter table etc.
c. SQL for Insert, Update, Delete, etc along with their extensions.
d. Syntax for view creation, if supported
e. Syntax for explain mechanism
f. List of supported functions, operators, etc. I can see that 100s of
function are added in 1.5 but then you have to make lot of cross check from
code to JIRA tickets.

So I wanted a piece of documentation that can provide all such information
at a single place.

Regards,
Vivek





On Fri, Sep 11, 2015 at 4:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> You may have seen this:
> https://spark.apache.org/docs/latest/sql-programming-guide.html
>
> Please suggest what should be added.
>
> Cheers
>
> On Fri, Sep 11, 2015 at 3:43 AM, vivek bhaskar <vivekw...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I am looking for a reference manual for Spark SQL some thing like many
>> database vendors have. I could find one for hive ql
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual but not
>> anything specific to spark sql.
>>
>> Please suggest. SQL reference specific to latest release will be of great
>> help.
>>
>> Regards,
>> Vivek
>>
>>
>


Is there any Spark SQL reference manual?

2015-09-11 Thread vivek bhaskar
Hi all,

I am looking for a reference manual for Spark SQL some thing like many
database vendors have. I could find one for hive ql
https://cwiki.apache.org/confluence/display/Hive/LanguageManual but not
anything specific to spark sql.

Please suggest. SQL reference specific to latest release will be of great
help.

Regards,
Vivek