Re: conver panda image column to spark dataframe

2023-08-03 Thread Sean Owen
pp4 has one row, I'm guessing - containing an array of 10 images. You want
10 rows of 1 image each.
But, just don't do this. Pass the bytes of the image as an array,
along with width/height/channels, and reshape it on use. It's just easier.
That is how the Spark image representation works anyway

On Thu, Aug 3, 2023 at 8:43 PM second_co...@yahoo.com.INVALID
 wrote:

> Hello Adrian,
>
>   here is the snippet
>
> import tensorflow_datasets as tfds
>
> (ds_train, ds_test), ds_info = tfds.load(
> dataset_name, data_dir='',  split=["train",
> "test"], with_info=True, as_supervised=True
> )
>
> schema = StructType([
> StructField("image",
> ArrayType(ArrayType(ArrayType(IntegerType(, nullable=False),
> StructField("label", IntegerType(), nullable=False)
> ])
> pp4 =
> spark.createDataFrame(pd.DataFrame(tfds.as_dataframe(ds_train.take(4),
> ds_info)), schema)
>
>
>
> raised error
>
> , TypeError: field image: ArrayType(ArrayType(ArrayType(IntegerType(), True), 
> True), True) can not accept object array([[[14, 14, 14],
> [14, 14, 14],
> [14, 14, 14],
> ...,
> [19, 17, 20],
> [19, 17, 20],
> [19, 17, 20]],
>
>
>
>
>
> On Thursday, August 3, 2023 at 11:34:08 PM GMT+8, Adrian Pop-Tifrea <
> poptifreaadr...@gmail.com> wrote:
>
>
> Hello,
>
> can you also please show us how you created the pandas dataframe? I mean,
> how you added the actual data into the dataframe. It would help us for
> reproducing the error.
>
> Thank you,
> Pop-Tifrea Adrian
>
> On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com <
> second_co...@yahoo.com> wrote:
>
> i changed to
>
> ArrayType(ArrayType(ArrayType(IntegerType( , still get same error
>
> Thank you for responding
>
> On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea <
> poptifreaadr...@gmail.com> wrote:
>
>
> Hello,
>
> when you said your pandas Dataframe has 10 rows, does that mean it
> contains 10 images? Because if that's the case, then you'd want ro only use
> 3 layers of ArrayType when you define the schema.
>
> Best regards,
> Adrian
>
>
>
> On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID
>  wrote:
>
> i have panda dataframe with column 'image' using numpy.ndarray. shape is (500,
> 333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10,
> 500, 333, 3)
>
> when using spark.createDataframe(panda_dataframe, schema), i need to
> specify the schema,
>
> schema = StructType([
> StructField("image",
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), nullable=False)
> ])
>
>
> i get error
>
> raise TypeError(
> , TypeError: field image: 
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
> True) can not accept object array([[[14, 14, 14],
>
> ...
>
> Can advise how to set schema for image with numpy.ndarray ?
>
>
>
>


Unsubscribe

2023-08-03 Thread Denys Cherepanin
Unsubscribe


Re: conver panda image column to spark dataframe

2023-08-03 Thread second_co...@yahoo.com.INVALID
 Hello Adrian, 
  here is the snippet 
import tensorflow_datasets as tfds
(ds_train, ds_test), ds_info = tfds.load(
    dataset_name, data_dir='',  split=["train", 
"test"], with_info=True, as_supervised=True
)
schema = StructType([
    StructField("image", ArrayType(ArrayType(ArrayType(IntegerType(, 
nullable=False),
    StructField("label", IntegerType(), nullable=False)
    ])
pp4 = spark.createDataFrame(pd.DataFrame(tfds.as_dataframe(ds_train.take(4), 
ds_info)), schema)



raised error
, TypeError: field image: ArrayType(ArrayType(ArrayType(IntegerType(), True), 
True), True) can not accept object array([[[14, 14, 14],
[14, 14, 14],
[14, 14, 14],
...,
[19, 17, 20],
[19, 17, 20],
[19, 17, 20]],




On Thursday, August 3, 2023 at 11:34:08 PM GMT+8, Adrian Pop-Tifrea 
 wrote:  
 
 Hello, 

can you also please show us how you created the pandas dataframe? I mean, how 
you added the actual data into the dataframe. It would help us for reproducing 
the error.
Thank you,Pop-Tifrea Adrian

On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com  
wrote:

 i changed to 

ArrayType(ArrayType(ArrayType(IntegerType( , still get same error
Thank you for responding

On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea 
 wrote:  
 
 Hello, 
when you said your pandas Dataframe has 10 rows, does that mean it contains 10 
images? Because if that's the case, then you'd want ro only use 3 layers of 
ArrayType when you define the schema.
Best regards,Adrian


On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID 
 wrote:

i have panda dataframe with column 'image' using numpy.ndarray. shape is (500, 
333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10, 500, 
333, 3)
when using spark.createDataframe(panda_dataframe, schema), i need to specify 
the schema, 

schema = StructType([
    StructField("image", 
ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), nullable=False)
    ])

i get error
raise TypeError(
, TypeError: field image: 
ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
True) can not accept object array([[[14, 14, 14],...
Can advise how to set schema for image with numpy.ndarray ?



  
  

Re: Interested in contributing to SPARK-24815

2023-08-03 Thread Sean Owen
Formally, an ICLA is required, and you can read more here:
https://www.apache.org/licenses/contributor-agreements.html

In practice, it's unrealistic to collect and verify an ICLA for every PR
contributed by 1000s of people. We have not gated on that.
But, contributions are in all cases governed by the same terms, even
without a signed ICLA. That's the verbiage you're referring to.
A CLA is a good idea, for sure, if there are any questions about the terms
of your contribution.

Here there does seem to be a question - retaining Twilio copyright headers
in source code. That is generally not what would happen for your everyday
contributions to an ASF project, as the copyright header (and CLAs) already
describe the relevant questions of rights: it has been licensed to the ASF.
(There are other situations where retaining a distinct copyright header is
required, typically when adding code licensed under another OSS license,
but I don't think they apply here)

I would say you should review and execute a CCLA for Twilio (assuming you
agree with the terms) to avoid doubt.


On Thu, Aug 3, 2023 at 6:34 PM Rinat Shangeeta 
wrote:

> (Adding my manager Eugene Kim who will cover me as I plan to be out of the
> office soon)
>
> Hi Kent and Sean,
>
> Nice to meet you. I am working on the OSS legal aspects with Pavan who is
> planning to make the contribution request to the Spark project. I saw that
> Sean mentioned in his email that the contributions would be governed under
> the ASF CCLA. In the Spark contribution guidelines
> , there is no mention of
> having to sign a CCLA. In fact, this is what I found in the contribution
> guidelines:
>
> Contributing code changes
>
> Please review the preceding section before proposing a code change. This
> section documents how to do so.
>
> When you contribute code, you affirm that the contribution is your
> original work and that you license the work to the project under the
> project’s open source license. Whether or not you state this explicitly,
> by submitting any copyrighted material via pull request, email, or other
> means you agree to license the material under the project’s open source
> license and warrant that you have the legal authority to do so.
>
> Can you please point us to an authoritative source about the process?
>
> Also, is there a way to find out if a signed CCLA already exists for
> Twilio from your end? Thanks and appreciate your help!
>
>
> Best,
> Rinat
>
> *Rinat Shangeeta*
> Sr. Patent/Open Source Counsel
> [image: Twilio] 
>
>
> On Wed, Jul 26, 2023 at 2:27 PM Pavan Kotikalapudi <
> pkotikalap...@twilio.com> wrote:
>
>> Thanks for the response with all the information Sean and Kent.
>>
>> Is there a way to figure out if my employer (Twilio) part of CCLA?
>>
>> cc'ing: @Rinat Shangeeta  our Open Source Counsel
>> at twilio
>>
>> Thank you,
>>
>> Pavan
>>
>> On Tue, Jul 25, 2023 at 10:48 PM Kent Yao  wrote:
>>
>>> Hi Pavan,
>>>
>>> Refer to the ASF Source Header and Copyright Notice Policy[1], code
>>> directly submitted to ASF should include the Apache license header
>>> without any additional copyright notice.
>>>
>>>
>>> Kent Yao
>>>
>>> [1]
>>> https://urldefense.com/v3/__https://www.apache.org/legal/src-headers.html*headers__;Iw!!NCc8flgU!c_mZKzBbSjJtYRjillV20gRzzzDOgW2ooH6ctfrqaJA8Eu4D5yfA7OlQnGm5JpdAZIU_doYmrsufzUc$
>>>
>>> Sean Owen  于2023年7月25日周二 07:22写道:
>>>
>>> >
>>> > When contributing to an ASF project, it's governed by the terms of the
>>> ASF ICLA:
>>> https://urldefense.com/v3/__https://www.apache.org/licenses/icla.pdf__;!!NCc8flgU!c_mZKzBbSjJtYRjillV20gRzzzDOgW2ooH6ctfrqaJA8Eu4D5yfA7OlQnGm5JpdAZIU_doYmZDPppZg$
>>> or CCLA:
>>> https://urldefense.com/v3/__https://www.apache.org/licenses/cla-corporate.pdf__;!!NCc8flgU!c_mZKzBbSjJtYRjillV20gRzzzDOgW2ooH6ctfrqaJA8Eu4D5yfA7OlQnGm5JpdAZIU_doYmUNwE-5A$
>>> >
>>> > I don't believe ASF projects ever retain an original author copyright
>>> statement, but rather source files have a statement like:
>>> >
>>> > ...
>>> >  * Licensed to the Apache Software Foundation (ASF) under one or more
>>> >  * contributor license agreements.  See the NOTICE file distributed
>>> with
>>> >  * this work for additional information regarding copyright ownership.
>>> > ...
>>> >
>>> > While it's conceivable that such a statement could live in a NOTICE
>>> file, I don't believe that's been done for any of the thousands of other
>>> contributors. That's really more for noting the license of
>>> non-Apache-licensed code. Code directly contributed to the project is
>>> assumed to have been licensed per above already.
>>> >
>>> > It might be wise to review the CCLA with Twilio and consider
>>> establishing that to govern contributions.
>>> >
>>> > On Mon, Jul 24, 2023 at 6:10 PM Pavan Kotikalapudi <
>>> pkotikalap...@twilio.com.invalid> wrote:
>>> >>
>>> >> Hi Spark Dev,
>>> >>
>>> >> My name is Pavan Kotikalapudi, I work at 

Re: Interested in contributing to SPARK-24815

2023-08-03 Thread Rinat Shangeeta
(Adding my manager Eugene Kim who will cover me as I plan to be out of the
office soon)

Hi Kent and Sean,

Nice to meet you. I am working on the OSS legal aspects with Pavan who is
planning to make the contribution request to the Spark project. I saw that
Sean mentioned in his email that the contributions would be governed under
the ASF CCLA. In the Spark contribution guidelines
, there is no mention of having
to sign a CCLA. In fact, this is what I found in the contribution
guidelines:

Contributing code changes

Please review the preceding section before proposing a code change. This
section documents how to do so.

When you contribute code, you affirm that the contribution is your original
work and that you license the work to the project under the project’s open
source license. Whether or not you state this explicitly, by submitting any
copyrighted material via pull request, email, or other means you agree
to license
the material under the project’s open source license and warrant that you
have the legal authority to do so.

Can you please point us to an authoritative source about the process?

Also, is there a way to find out if a signed CCLA already exists for Twilio
from your end? Thanks and appreciate your help!


Best,
Rinat

*Rinat Shangeeta*
Sr. Patent/Open Source Counsel
[image: Twilio] 


On Wed, Jul 26, 2023 at 2:27 PM Pavan Kotikalapudi 
wrote:

> Thanks for the response with all the information Sean and Kent.
>
> Is there a way to figure out if my employer (Twilio) part of CCLA?
>
> cc'ing: @Rinat Shangeeta  our Open Source Counsel
> at twilio
>
> Thank you,
>
> Pavan
>
> On Tue, Jul 25, 2023 at 10:48 PM Kent Yao  wrote:
>
>> Hi Pavan,
>>
>> Refer to the ASF Source Header and Copyright Notice Policy[1], code
>> directly submitted to ASF should include the Apache license header
>> without any additional copyright notice.
>>
>>
>> Kent Yao
>>
>> [1]
>> https://urldefense.com/v3/__https://www.apache.org/legal/src-headers.html*headers__;Iw!!NCc8flgU!c_mZKzBbSjJtYRjillV20gRzzzDOgW2ooH6ctfrqaJA8Eu4D5yfA7OlQnGm5JpdAZIU_doYmrsufzUc$
>>
>> Sean Owen  于2023年7月25日周二 07:22写道:
>>
>> >
>> > When contributing to an ASF project, it's governed by the terms of the
>> ASF ICLA:
>> https://urldefense.com/v3/__https://www.apache.org/licenses/icla.pdf__;!!NCc8flgU!c_mZKzBbSjJtYRjillV20gRzzzDOgW2ooH6ctfrqaJA8Eu4D5yfA7OlQnGm5JpdAZIU_doYmZDPppZg$
>> or CCLA:
>> https://urldefense.com/v3/__https://www.apache.org/licenses/cla-corporate.pdf__;!!NCc8flgU!c_mZKzBbSjJtYRjillV20gRzzzDOgW2ooH6ctfrqaJA8Eu4D5yfA7OlQnGm5JpdAZIU_doYmUNwE-5A$
>> >
>> > I don't believe ASF projects ever retain an original author copyright
>> statement, but rather source files have a statement like:
>> >
>> > ...
>> >  * Licensed to the Apache Software Foundation (ASF) under one or more
>> >  * contributor license agreements.  See the NOTICE file distributed with
>> >  * this work for additional information regarding copyright ownership.
>> > ...
>> >
>> > While it's conceivable that such a statement could live in a NOTICE
>> file, I don't believe that's been done for any of the thousands of other
>> contributors. That's really more for noting the license of
>> non-Apache-licensed code. Code directly contributed to the project is
>> assumed to have been licensed per above already.
>> >
>> > It might be wise to review the CCLA with Twilio and consider
>> establishing that to govern contributions.
>> >
>> > On Mon, Jul 24, 2023 at 6:10 PM Pavan Kotikalapudi <
>> pkotikalap...@twilio.com.invalid> wrote:
>> >>
>> >> Hi Spark Dev,
>> >>
>> >> My name is Pavan Kotikalapudi, I work at Twilio.
>> >>
>> >> I am looking to contribute to this spark issue
>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/SPARK-24815__;!!NCc8flgU!c_mZKzBbSjJtYRjillV20gRzzzDOgW2ooH6ctfrqaJA8Eu4D5yfA7OlQnGm5JpdAZIU_doYmgOh9sIg$
>> .
>> >>
>> >> There is a clause from the company's OSS saying
>> >>
>> >> - The proposed contribution is about 100 lines of code modification in
>> the Spark project, involving two files - this is considered a large
>> contribution. An appropriate Twilio copyright notice needs to be added for
>> the portion of code that is newly added.
>> >>
>> >> Please let me know if that is acceptable?
>> >>
>> >> Thank you,
>> >>
>> >> Pavan
>> >>
>>
>


Re: conver panda image column to spark dataframe

2023-08-03 Thread Adrian Pop-Tifrea
Hello,

can you also please show us how you created the pandas dataframe? I mean,
how you added the actual data into the dataframe. It would help us for
reproducing the error.

Thank you,
Pop-Tifrea Adrian

On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com <
second_co...@yahoo.com> wrote:

> i changed to
>
> ArrayType(ArrayType(ArrayType(IntegerType( , still get same error
>
> Thank you for responding
>
> On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea <
> poptifreaadr...@gmail.com> wrote:
>
>
> Hello,
>
> when you said your pandas Dataframe has 10 rows, does that mean it
> contains 10 images? Because if that's the case, then you'd want ro only use
> 3 layers of ArrayType when you define the schema.
>
> Best regards,
> Adrian
>
>
>
> On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID
>  wrote:
>
> i have panda dataframe with column 'image' using numpy.ndarray. shape is (500,
> 333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10,
> 500, 333, 3)
>
> when using spark.createDataframe(panda_dataframe, schema), i need to
> specify the schema,
>
> schema = StructType([
> StructField("image",
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), nullable=False)
> ])
>
>
> i get error
>
> raise TypeError(
> , TypeError: field image: 
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
> True) can not accept object array([[[14, 14, 14],
>
> ...
>
> Can advise how to set schema for image with numpy.ndarray ?
>
>
>
>


Custom Session Windowing in Spark using Scala/Python

2023-08-03 Thread Ravi Teja
Hi,

I am new to Spark and looking for help regarding the session windowing

in Spark. I want to create session windows on a user activity stream with a
gap duration of `x` minutes and also have a maximum window size of `y`
hours. I cannot let spark the aggregating the user events for days before
submitting them to the next step. For example, I want Spark to submit the
session window if there's no activity for 30 minutes but I also want Spark
to submit the session window once it hits the 5 hour limit. I'd like the
solution to be based only on event time and not processing time. Any help
with this is greatly appreciated. Thank you.

-- 
Best,
RaviTeja