Re: Build error: python/lib/pyspark.zip is not a ZIP archive

2020-01-10 Thread Jeff Evans
Actually, there is a really trivial fix for that (an existing file not
being deleted when packaging).  Opened SPARK-30489 for it.

On Fri, Jan 10, 2020 at 3:52 PM Jeff Evans 
wrote:

> Thanks for the tip.  Fixed by simply removing python/lib/pyspark.zip
> (since it's apparently generated), and rebuilding.  I guess clean does
> not remove it.
>
> On Fri, Jan 10, 2020 at 3:50 PM Sean Owen  wrote:
>
>> Sounds like you might have some corrupted file locally. I don't see
>> any of the automated test builders failing. Nuke your local assembly
>> build and try again?
>>
>> On Fri, Jan 10, 2020 at 3:49 PM Jeff Evans
>>  wrote:
>> >
>> > Greetings,
>> >
>> > I'm getting an error when building, on latest master (2bd873181 as of
>> this writing).  Full build command I'm running is: ./build/mvn -DskipTests
>> clean package
>> >
>> > [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-antrun-plugin:1.8:run (create-tmp-dir) on
>> project spark-assembly_2.12: An Ant BuildException has occured: Problem
>> reading /Users/jeff/dev/spark/python/lib/pyspark.zip
>> > [ERROR] around Ant part ...> destfile="/Users/jeff/dev/spark/assembly/../python/lib/pyspark.zip">... @
>> 6:76 in /Users/jeff/dev/spark/assembly/target/antrun/build-main.xml:
>> archive is not a ZIP archive
>> > [ERROR] -> [Help 1]
>> >
>> > Trying to run unzip -l python/lib/pyspark.zip does seem to suggest it's
>> not a valid zip file.  Any ideas what might be wrong?  I tried searching
>> the archives and didn't see anything relevant.  Thanks.
>> >
>> > OS X Catalina 10.5.2
>> > OpenJDK 1.8.0_212
>> > Maven 3.6.3
>> > Python 3.8.1 (via pyenv)
>>
>


Re: Build error: python/lib/pyspark.zip is not a ZIP archive

2020-01-10 Thread Jeff Evans
Thanks for the tip.  Fixed by simply removing python/lib/pyspark.zip (since
it's apparently generated), and rebuilding.  I guess clean does not remove
it.

On Fri, Jan 10, 2020 at 3:50 PM Sean Owen  wrote:

> Sounds like you might have some corrupted file locally. I don't see
> any of the automated test builders failing. Nuke your local assembly
> build and try again?
>
> On Fri, Jan 10, 2020 at 3:49 PM Jeff Evans
>  wrote:
> >
> > Greetings,
> >
> > I'm getting an error when building, on latest master (2bd873181 as of
> this writing).  Full build command I'm running is: ./build/mvn -DskipTests
> clean package
> >
> > [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-antrun-plugin:1.8:run (create-tmp-dir) on
> project spark-assembly_2.12: An Ant BuildException has occured: Problem
> reading /Users/jeff/dev/spark/python/lib/pyspark.zip
> > [ERROR] around Ant part ... destfile="/Users/jeff/dev/spark/assembly/../python/lib/pyspark.zip">... @
> 6:76 in /Users/jeff/dev/spark/assembly/target/antrun/build-main.xml:
> archive is not a ZIP archive
> > [ERROR] -> [Help 1]
> >
> > Trying to run unzip -l python/lib/pyspark.zip does seem to suggest it's
> not a valid zip file.  Any ideas what might be wrong?  I tried searching
> the archives and didn't see anything relevant.  Thanks.
> >
> > OS X Catalina 10.5.2
> > OpenJDK 1.8.0_212
> > Maven 3.6.3
> > Python 3.8.1 (via pyenv)
>


Re: Build error: python/lib/pyspark.zip is not a ZIP archive

2020-01-10 Thread Sean Owen
Sounds like you might have some corrupted file locally. I don't see
any of the automated test builders failing. Nuke your local assembly
build and try again?

On Fri, Jan 10, 2020 at 3:49 PM Jeff Evans
 wrote:
>
> Greetings,
>
> I'm getting an error when building, on latest master (2bd873181 as of this 
> writing).  Full build command I'm running is: ./build/mvn -DskipTests clean 
> package
>
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-antrun-plugin:1.8:run (create-tmp-dir) on 
> project spark-assembly_2.12: An Ant BuildException has occured: Problem 
> reading /Users/jeff/dev/spark/python/lib/pyspark.zip
> [ERROR] around Ant part ... destfile="/Users/jeff/dev/spark/assembly/../python/lib/pyspark.zip">... @ 
> 6:76 in /Users/jeff/dev/spark/assembly/target/antrun/build-main.xml: archive 
> is not a ZIP archive
> [ERROR] -> [Help 1]
>
> Trying to run unzip -l python/lib/pyspark.zip does seem to suggest it's not a 
> valid zip file.  Any ideas what might be wrong?  I tried searching the 
> archives and didn't see anything relevant.  Thanks.
>
> OS X Catalina 10.5.2
> OpenJDK 1.8.0_212
> Maven 3.6.3
> Python 3.8.1 (via pyenv)

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Build error: python/lib/pyspark.zip is not a ZIP archive

2020-01-10 Thread Jeff Evans
Greetings,

I'm getting an error when building, on latest master (2bd873181 as of this
writing).  Full build command I'm running is: ./build/mvn -DskipTests clean
package

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-antrun-plugin:1.8:run (create-tmp-dir) on
project spark-assembly_2.12: An Ant BuildException has occured: Problem
reading /Users/jeff/dev/spark/python/lib/pyspark.zip
[ERROR] around Ant part .. @
6:76 in /Users/jeff/dev/spark/assembly/target/antrun/build-main.xml:
archive is not a ZIP archive
[ERROR] -> [Help 1]

Trying to run unzip -l python/lib/pyspark.zip does seem to suggest it's not
a valid zip file.  Any ideas what might be wrong?  I tried searching the
archives and didn't see anything relevant.  Thanks.

   - OS X Catalina 10.5.2
   - OpenJDK 1.8.0_212
   - Maven 3.6.3
   - Python 3.8.1 (via pyenv)


Re: [DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Reynold Xin
Introducing a new data type has high overhead, both in terms of internal 
complexity and users' cognitive load. Introducing two data types would have 
even higher overhead.

I looked quickly and looks like both Redshift and Snowflake, two of the most 
recent SQL analytics successes, have only one interval type, and don't support 
storing that. That gets me thinking in reality storing interval type is not 
that useful.

Do we really need to do this? One of the worst things we can do as a community 
is to introduce features that are almost never used, but at the same time have 
high internal complexity for maintenance.

On Fri, Jan 10, 2020 at 10:45 AM, Dongjoon Hyun < dongjoon.h...@gmail.com > 
wrote:

> 
> Thank you for clarification.
> 
> 
> Bests,
> Dongjoon.
> 
> On Fri, Jan 10, 2020 at 10:07 AM Kent Yao < yaooqinn@ qq. com (
> yaooq...@qq.com ) > wrote:
> 
> 
>> 
>> Hi Dongjoon,
>> 
>> 
>> Yes, As we want make CalenderIntervalType deprecated and so far, we just
>> find
>> 1. The make_interval function that produces legacy CalenderIntervalType
>> values, 
>> 2. `interval` -> CalenderIntervalType support in the parser
>> 
>> 
>> Thanks
>> 
>> 
>> *Kent Yao*
>> Data Science Center, Hangzhou Research Institute, Netease Corp.
>> PHONE: (86) 186-5715-3499
>> EMAIL: hzyaoqin@ corp. netease. com ( hzyao...@corp.netease.com )
>> 
>> 
>> On 01/11/2020 01:57 , Dongjoon Hyun (
>> dongjoon.h...@gmail.com ) wrote:
>> 
>>> Hi, Kent. 
>>> 
>>> 
>>> Thank you for the proposal.
>>> 
>>> 
>>> Does your proposal need to revert something from the master branch?
>>> I'm just asking because it's not clear in the proposal document.
>>> 
>>> 
>>> Bests,
>>> Dongjoon.
>>> 
>>> On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao < yaooqinn@ qq. com (
>>> yaooq...@qq.com ) > wrote:
>>> 
>>> 
 Hi, Devs
 
 I’d like to propose to add two new interval types which are year-month and
 
 day-time intervals for better ANSI support and future improvements. We
 will
 keep the current CalenderIntervalType but mark it as deprecated until we
 find the right time to remove it completely. The backward compatibility of
 
 the old interval type usages in 2.4 will be guaranteed.
 
 Here is the design doc:
 
 [SPIP] Support Year-Month and Day-Time Intervals -
 https:/ / docs. google. com/ document/ d/ 
 1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/
 edit?usp=sharing (
 https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing
 )
 
 All comments are welcome!
 
 Thanks,
 
 Kent Yao
 
 
 
 
 --
 Sent from: http:/ / apache-spark-developers-list. 1001551. n3. nabble. com/
 ( http://apache-spark-developers-list.1001551.n3.nabble.com/ )
 
 -
 To unsubscribe e-mail: dev-unsubscribe@ spark. apache. org (
 dev-unsubscr...@spark.apache.org )
>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: [DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Dongjoon Hyun
Thank you for clarification.

Bests,
Dongjoon.

On Fri, Jan 10, 2020 at 10:07 AM Kent Yao  wrote:

> Hi Dongjoon,
>
> Yes, As we want make CalenderIntervalType deprecated and so far, we just
> find
> 1. The make_interval function that produces legacy CalenderIntervalType
> values,
> 2. `interval` -> CalenderIntervalType support in the parser
>
> Thanks
>
> *Kent Yao*
> Data Science Center, Hangzhou Research Institute, Netease Corp.
> PHONE: (86) 186-5715-3499
> EMAIL: hzyao...@corp.netease.com
>
> On 01/11/2020 01:57,Dongjoon Hyun
>  wrote:
>
> Hi, Kent.
>
> Thank you for the proposal.
>
> Does your proposal need to revert something from the master branch?
> I'm just asking because it's not clear in the proposal document.
>
> Bests,
> Dongjoon.
>
> On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao  wrote:
>
>> Hi, Devs
>>
>> I’d like to propose to add two new interval types which are year-month and
>> day-time intervals for better ANSI support and future improvements. We
>> will
>> keep the current CalenderIntervalType but mark it as deprecated until we
>> find the right time to remove it completely. The backward compatibility of
>> the old interval type usages in 2.4 will be guaranteed.
>>
>> Here is the design doc:
>>
>> [SPIP] Support Year-Month and Day-Time Intervals -
>>
>> https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing
>>
>> All comments are welcome!
>>
>> Thanks,
>>
>> Kent Yao
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Kent Yao







Hi Dongjoon,Yes, As we want make CalenderIntervalType deprecated and so far, we just find1. The make_interval function that produces legacy CalenderIntervalType values, 2. `interval` -> CalenderIntervalType support in the parserThanks






  



Kent YaoData Science Center, Hangzhou Research Institute, Netease Corp.PHONE: (86) 186-5715-3499EMAIL: hzyao...@corp.netease.com



 


On 01/11/2020 01:57,Dongjoon Hyun wrote: 


Hi, Kent. Thank you for the proposal.Does your proposal need to revert something from the master branch?I'm just asking because it's not clear in the proposal document.Bests,Dongjoon.On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao  wrote:Hi, Devs

I’d like to propose to add two new interval types which are year-month and
day-time intervals for better ANSI support and future improvements. We will
keep the current CalenderIntervalType but mark it as deprecated until we
find the right time to remove it completely. The backward compatibility of
the old interval type usages in 2.4 will be guaranteed.

Here is the design doc:

[SPIP] Support Year-Month and Day-Time Intervals -
https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing

All comments are welcome!

Thanks,

Kent Yao




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org








Re: [DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Dongjoon Hyun
Hi, Kent.

Thank you for the proposal.

Does your proposal need to revert something from the master branch?
I'm just asking because it's not clear in the proposal document.

Bests,
Dongjoon.

On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao  wrote:

> Hi, Devs
>
> I’d like to propose to add two new interval types which are year-month and
> day-time intervals for better ANSI support and future improvements. We will
> keep the current CalenderIntervalType but mark it as deprecated until we
> find the right time to remove it completely. The backward compatibility of
> the old interval type usages in 2.4 will be guaranteed.
>
> Here is the design doc:
>
> [SPIP] Support Year-Month and Day-Time Intervals -
>
> https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing
>
> All comments are welcome!
>
> Thanks,
>
> Kent Yao
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


unsubscribe

2020-01-10 Thread vijendra rana
unsubscribe


Re: unsubscribe

2020-01-10 Thread steve goodwin


unsubscribe


[DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Dr. Kent Yao
Hi, Devs

I’d like to propose to add two new interval types which are year-month and
day-time intervals for better ANSI support and future improvements. We will
keep the current CalenderIntervalType but mark it as deprecated until we
find the right time to remove it completely. The backward compatibility of
the old interval type usages in 2.4 will be guaranteed.

Here is the design doc:

[SPIP] Support Year-Month and Day-Time Intervals -
https://docs.google.com/document/d/1JNRzcBk4hcm7k2cOXSG1A9U9QM2iNGQzBSXZzScUwAU/edit?usp=sharing

All comments are welcome!

Thanks,

Kent Yao




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Revisiting Python / pandas UDF (new proposal)

2020-01-10 Thread Hyukjin Kwon
Hi all, I made a PR - https://github.com/apache/spark/pull/27165
Please have a look when you guys fine some times.

I addressed another point (by Maciej), "A couple of less-intuitive pandas
UDF types" together because
the more I look, the more I felt I should deal with it together with the
proposal.


2020년 1월 6일 (월) 오후 10:52, Hyukjin Kwon 님이 작성:

> I happened to propose a somewhat big refactoring PR as a preparation for
> this.
> Basically, grouping all related codes into one sub-package since currently
> all pandas and PyArrow related codes are here and there.
> I would appreciate if you guys can review and give some feedback.
>
> https://github.com/apache/spark/pull/27109
>
> Thanks!
>
>
> 2020년 1월 4일 (토) 오전 5:11, Li Jin 님이 작성:
>
>> Hyukjin,
>>
>> Thanks for putting this together. I took a look at the proposal and left
>> some comments. At the high level I like using type hints to specify
>> input/output types but not so use about type hints for cordiality. I have
>> commented on more details in the doc.
>>
>> Li
>>
>> On Thu, Jan 2, 2020 at 9:42 AM Li Jin  wrote:
>>
>>> I am going to review this carefully today. Thanks for the work!
>>>
>>> Li
>>>
>>> On Wed, Jan 1, 2020 at 10:34 PM Hyukjin Kwon 
>>> wrote:
>>>
 Thanks for comments Maciej - I am addressing them.
 adding Li Jin too.

 I plan to proceed this late this week or early next week to make it on
 time before code freeze.
 I am going to pretty actively respond so please give feedback if
 there's any :-).



 2019년 12월 30일 (월) 오후 6:45, Hyukjin Kwon 님이 작성:

> Hi all,
>
> I happen to come up with another idea about pandas redesign.
> Thanks Reynold, Bryan, Xiangrui, Takuya and Tim for offline
> discussions and
> helping me to write this proposal.
>
> Please take a look and let me know what you guys think.
>
> -
> https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing
> - https://issues.apache.org/jira/browse/SPARK-28264
>
> I know it's a holiday season but please have some time to take a look
> so
> we can make it on time before code freeze (31st Jan).
>
>