date:20210625

[jira] [Resolved] (ARROW-12730) [MATLAB] Update featherreadmex and featherwritemex to build against latest arrow c++ APIs

2021-06-25 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-12730.
--
Fix Version/s: 5.0.0
   Resolution: Fixed

Issue resolved by pull request 10305
[https://github.com/apache/arrow/pull/10305]

> [MATLAB] Update featherreadmex and featherwritemex to build against latest 
> arrow c++ APIs
> -
>
> Key: ARROW-12730
> URL: https://issues.apache.org/jira/browse/ARROW-12730
> Project: Apache Arrow
>  Issue Type: Task
>  Components: MATLAB
>Reporter: Sarah Gilmore
>Assignee: Sarah Gilmore
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> The mex functions featherreadmex and featherwritemex currently do not compile 
> if you are using the latest arrow c++ APIs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Weston Pace (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-13187:

Attachment: backward-refs.png

> [c++][python] Possibly memory not deallocated when reading in CSV
> -
>
> Key: ARROW-13187
> URL: https://issues.apache.org/jira/browse/ARROW-13187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 4.0.1
>Reporter: Simon
>Priority: Minor
> Attachments: backward-refs.png, forward-refs.png
>
>
> When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
> the read-in table variable is not freed (or not fast enough). I'm unsure if 
> this is because of pyarrow or because of the way pyarrow memory allocation 
> interacts with Python memory allocation. I encountered it when processing 
> many large CSVs sequentially.
> When I run the following piece of code, the RAM memory usage increases quite 
> rapidly until it runs out of memory.
> {code:python}
> import pyarrow as pa
> import pyarrow.csv
> # Generate some CSV file to read in
> print("Generating CSV")
> with open("example.csv", "w+") as f_out:
> for i in range(0, 1000):
> f_out.write("123456789,abc def ghi jkl\n")
> def read_in_the_csv():
> table = pa.csv.read_csv("example.csv")
> print(table)  # Not strictly necessary to replicate bug, table can also 
> be an unused variable
> # This will free up the memory, as a workaround:
> # table = table.slice(0, 0)
> # Read in the CSV many times
> print("Reading in a CSV many times")
> for j in range(10):
> read_in_the_csv()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Weston Pace (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369785#comment-17369785
 ] 

Weston Pace commented on ARROW-13187:
-

I have tracked down the cause further but I'm not entirely sure what the 
correct fix should be but I think it is a problem in cython.  The issue first 
occurs after commit 79ae4f6db3dfe06ba2e1b5c285a6695cfa58cf3d (ARROW-8732: [C++] 
Add basic cancellation API)

 

The method "read_csv" calls "SignalStopHandler()" which calls 
"signal.getsignal" which calls "signal.py::_int_to_enum" which intentionally 
triggers a ValueError (as is normal in python).

 

That ValueError has an associated traceback which is not disposed of correctly. 
 That traceback has a reference to each of the frames of the stack and one of 
those frames has a reference to "table".  Since a new traceback is generated 
for each loop of the iteration none of the CSVs are properly disposed of.  The 
slice method in the original PR or "del table" is a workable workaround.  As 
long as the frames aren't too big the garbage collector will eventually run and 
clean them up long before memory is lost.

 

I have no idea why the ValueError/traceback are not being disposed of.  I know 
cython has to do some games to manage tracebacks so it's possible there is an 
issue there.  I think I created a reproduction in pure python calling getsignal 
and it seems to manage memory correctly so I believe python is clear.

 

I've created a script to reproduce that also uses objgraph to generate 
reference graphs.  It also only runs one iteration so it is quicker and doesn't 
exceed RAM on the system.  It should print 0 as the last line.  If there is a 
leak it prints out ~270M.

 

 
{code:java}
import gc
import sys
import pyarrow.parquet
import pyarrow as pa
import pyarrow.csv
import objgraph

# Generate some CSV file to read in 

   
print("Generating CSV")
with open("example.csv", "w+") as f_out:
for i in range(0, 1000):
unused = f_out.write("123456789,abc def ghi jkl\n")

def read_in_the_csv():
table = pa.csv.read_csv("example.csv")
print(pa.total_allocated_bytes())

gc.disable()
gc.collect()
objs = gc.get_objects()
read_in_the_csv()
objs2 = gc.get_objects()
offensive_ids = set([id(obj) for obj in objs2]) - set([id(obj) for obj in objs])
badobjs = [obj for obj in objs2 if id(obj) in offensive_ids]
print(len(badobjs))
smallbadobjs = [obj for obj in badobjs if 'frame' in str(type(obj)) and 
'read_in_the_csv' in str(obj)]
objgraph.show_refs(smallbadobjs, refcounts=True)
objgraph.show_backrefs(smallbadobjs, refcounts=True)
print(pa.total_allocated_bytes())

{code}
 

So at this point I surrender and ask [~apitrou] [~jorisvandenbossche] or 
[~amol-] for help :)

 

*Forward refs show a frame in the traceback still reference 
Table*!forward-refs.png!

*Backward refs show the frame is referenced as part of a traceback (note, this 
graph is truncated, and does not show the source ValueError.  Also, the dict 
and two lists are from my debugging code and not related to the issue)*

*!backward-refs.png!*

 

 

> [c++][python] Possibly memory not deallocated when reading in CSV
> -
>
> Key: ARROW-13187
> URL: https://issues.apache.org/jira/browse/ARROW-13187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 4.0.1
>Reporter: Simon
>Priority: Minor
> Attachments: backward-refs.png, forward-refs.png
>
>
> When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
> the read-in table variable is not freed (or not fast enough). I'm unsure if 
> this is because of pyarrow or because of the way pyarrow memory allocation 
> interacts with Python memory allocation. I encountered it when processing 
> many large CSVs sequentially.
> When I run the following piece of code, the RAM memory usage increases quite 
> rapidly until it runs out of memory.
> {code:python}
> import pyarrow as pa
> import pyarrow.csv
> # Generate some CSV file to read in
> print("Generating CSV")
> with open("example.csv", "w+") as f_out:
> for i in range(0, 1000):
> f_out.write("123456789,abc def ghi jkl\n")
> def read_in_the_csv():
> table = pa.csv.read_csv("example.csv")
> print(table)  # Not strictly necessary to replicate bug, table can also 
> be an unused variable
> # This will free up the memory, as a workaround:
> # table = table.slice(0, 0)
> # Read in the CSV many times
> print("Reading in a CSV many times")
> for j in range(10):
> read_in_the_csv()
> {code}



--
This message was sent by Atlassian Jira

[jira] [Updated] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Weston Pace (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-13187:

Attachment: forward-refs.png

> [c++][python] Possibly memory not deallocated when reading in CSV
> -
>
> Key: ARROW-13187
> URL: https://issues.apache.org/jira/browse/ARROW-13187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 4.0.1
>Reporter: Simon
>Priority: Minor
> Attachments: forward-refs.png
>
>
> When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
> the read-in table variable is not freed (or not fast enough). I'm unsure if 
> this is because of pyarrow or because of the way pyarrow memory allocation 
> interacts with Python memory allocation. I encountered it when processing 
> many large CSVs sequentially.
> When I run the following piece of code, the RAM memory usage increases quite 
> rapidly until it runs out of memory.
> {code:python}
> import pyarrow as pa
> import pyarrow.csv
> # Generate some CSV file to read in
> print("Generating CSV")
> with open("example.csv", "w+") as f_out:
> for i in range(0, 1000):
> f_out.write("123456789,abc def ghi jkl\n")
> def read_in_the_csv():
> table = pa.csv.read_csv("example.csv")
> print(table)  # Not strictly necessary to replicate bug, table can also 
> be an unused variable
> # This will free up the memory, as a workaround:
> # table = table.slice(0, 0)
> # Read in the CSV many times
> print("Reading in a CSV many times")
> for j in range(10):
> read_in_the_csv()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13190) [C++] [Gandiva] Change behavior of INITCAP function

2021-06-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13190:
---
Labels: pull-request-available  (was: )

> [C++] [Gandiva] Change behavior of INITCAP function
> ---
>
> Key: ARROW-13190
> URL: https://issues.apache.org/jira/browse/ARROW-13190
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Anthony Louis Gotlib Ferreira
>Assignee: Anthony Louis Gotlib Ferreira
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current behavior of the *INITCAP* function is to turn the first character 
> of each word uppercase and remains the other as is.
> The desired behavior is to turn the first letter uppercase and the other 
> lowercase. Any character except the alphanumeric ones should be considered as 
> a word separator. 
> That behavior is based on these database systems:
>  * 
> [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm])
>  * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)]
>  * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)]
>  * [Splice 
> Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13191) [Go] Support external schema in ipc readers

2021-06-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13191:
---
Labels: pull-request-available  (was: )

> [Go] Support external schema in ipc readers
> ---
>
> Key: ARROW-13191
> URL: https://issues.apache.org/jira/browse/ARROW-13191
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Seth Hollyman
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (Apologies if I'm imprecise here, I'm still coming up to speed on the arrow 
> details.)
>  
> The IPC message format describes how data and metadata messages are 
> encapsulated, but it is not a requirement that each message must include the 
> schema.
>  
> In Go, github.com/apache/arrow/go/arrow/ipc contains NewReader() for setting 
> up reading of IPC messages, and accepts the option WithSchema to pass the 
> schema into said reader.  However, the implementation merely uses that 
> information to compare that the schema it reads from the IPC stream matches 
> the passed in reader.  This request is to allow WithSchema to behave as 
> expected, and use the option-provided Schema for performing reads.
>  
> The one gotcha here appears to be the dictionary type map, which is currently 
> retained independently of the schema but is part of the internal readSchema() 
> setup. Completeness may warrant another option for communicating those 
> externally as well?  Or perhaps option-passed Schema should be documented to 
> not support dictionary types?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13191) [Go] Support external schema in ipc readers

2021-06-25 Thread Seth Hollyman (Jira)

Seth Hollyman created ARROW-13191:
-

 Summary: [Go] Support external schema in ipc readers
 Key: ARROW-13191
 URL: https://issues.apache.org/jira/browse/ARROW-13191
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Seth Hollyman


(Apologies if I'm imprecise here, I'm still coming up to speed on the arrow 
details.)

 

The IPC message format describes how data and metadata messages are 
encapsulated, but it is not a requirement that each message must include the 
schema.

 

In Go, github.com/apache/arrow/go/arrow/ipc contains NewReader() for setting up 
reading of IPC messages, and accepts the option WithSchema to pass the schema 
into said reader.  However, the implementation merely uses that information to 
compare that the schema it reads from the IPC stream matches the passed in 
reader.  This request is to allow WithSchema to behave as expected, and use the 
option-provided Schema for performing reads.

 

The one gotcha here appears to be the dictionary type map, which is currently 
retained independently of the schema but is part of the internal readSchema() 
setup. Completeness may warrant another option for communicating those 
externally as well?  Or perhaps option-passed Schema should be documented to 
not support dictionary types?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13190) [C++] [Gandiva] Change behavior of INITCAP function

2021-06-25 Thread Anthony Louis Gotlib Ferreira (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Louis Gotlib Ferreira updated ARROW-13190:
--
Description: 
The current behavior of the *INITCAP* function is to turn the first character 
of each word uppercase and remains the other as is.

The desired behavior is to turn the first letter uppercase and the other 
lowercase. Any character except the alphanumeric ones should be considered as a 
word separator. 

That behavior is based on these database systems:
 * 
[Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm])
 * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)]
 * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)]
 * [Splice 
Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html])

  was:
The current behavior of the `INITCAP` function is to turn the first character 
of each word uppercase and remains the other as is.

The desired behavior is to turn the first letter uppercase and the other 
lowercase. Any character except the alphanumeric ones should be considered as a 
word separator. 

That behavior is based on these database systems:
 * 
[Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm])
 * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)]
 * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)]
 * [Splice 
Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html])


> [C++] [Gandiva] Change behavior of INITCAP function
> ---
>
> Key: ARROW-13190
> URL: https://issues.apache.org/jira/browse/ARROW-13190
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Anthony Louis Gotlib Ferreira
>Assignee: Anthony Louis Gotlib Ferreira
>Priority: Trivial
>
> The current behavior of the *INITCAP* function is to turn the first character 
> of each word uppercase and remains the other as is.
> The desired behavior is to turn the first letter uppercase and the other 
> lowercase. Any character except the alphanumeric ones should be considered as 
> a word separator. 
> That behavior is based on these database systems:
>  * 
> [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm])
>  * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)]
>  * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)]
>  * [Splice 
> Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13190) [C++] [Gandiva] Change behavior of INITCAP function

2021-06-25 Thread Anthony Louis Gotlib Ferreira (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Louis Gotlib Ferreira updated ARROW-13190:
--
Description: 
The current behavior of the `INITCAP` function is to turn the first character 
of each word uppercase and remains the other as is.

The desired behavior is to turn the first letter uppercase and the other 
lowercase. Any character except the alphanumeric ones should be considered as a 
word separator. 

That behavior is based on these database systems:
 * 
[Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm])
 * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)]
 * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)]
 * [Splice 
Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html])

  was:
The current behavior of the `INITCAP` function is to turn the first character 
of each word uppercase and remains the other as is.

 

 

The desired behavior is to turn the first letter uppercase and the other 
lowercase. Any character except the alphanumeric ones should be considered as a 
word separator.

 

That behavior is based on these database systems:
 * 
[Oracle](https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm)
 * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)]
 * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)]
 * [Splice 
Machine](https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html)


> [C++] [Gandiva] Change behavior of INITCAP function
> ---
>
> Key: ARROW-13190
> URL: https://issues.apache.org/jira/browse/ARROW-13190
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Anthony Louis Gotlib Ferreira
>Assignee: Anthony Louis Gotlib Ferreira
>Priority: Trivial
>
> The current behavior of the `INITCAP` function is to turn the first character 
> of each word uppercase and remains the other as is.
> The desired behavior is to turn the first letter uppercase and the other 
> lowercase. Any character except the alphanumeric ones should be considered as 
> a word separator. 
> That behavior is based on these database systems:
>  * 
> [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm])
>  * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)]
>  * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)]
>  * [Splice 
> Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13190) [C++] [Gandiva] Change behavior of INITCAP function

2021-06-25 Thread Anthony Louis Gotlib Ferreira (Jira)

Anthony Louis Gotlib Ferreira created ARROW-13190:
-

 Summary: [C++] [Gandiva] Change behavior of INITCAP function
 Key: ARROW-13190
 URL: https://issues.apache.org/jira/browse/ARROW-13190
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Gandiva
Reporter: Anthony Louis Gotlib Ferreira
Assignee: Anthony Louis Gotlib Ferreira


The current behavior of the `INITCAP` function is to turn the first character 
of each word uppercase and remains the other as is.

 

 

The desired behavior is to turn the first letter uppercase and the other 
lowercase. Any character except the alphanumeric ones should be considered as a 
word separator.

 

That behavior is based on these database systems:
 * 
[Oracle](https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm)
 * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)]
 * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)]
 * [Splice 
Machine](https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-13119) [R] Set empty schema in scalar Expressions

2021-06-25 Thread Ian Cook (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook resolved ARROW-13119.
--
Resolution: Fixed

Resolved by https://github.com/apache/arrow/pull/10563

> [R] Set empty schema in scalar Expressions
> --
>
> Key: ARROW-13119
> URL: https://issues.apache.org/jira/browse/ARROW-13119
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
> Fix For: 5.0.0
>
>
> Closely related to ARROW-13117 is the problem of {{type()}} and {{type_id()}} 
> not working for scalar expressions. For example, currently this happens:
> {code:r}> Expression$scalar("foo")$type()
> Error: !is.null(schema) is not TRUE
> > Expression$scalar(42L)$type()
> Error: !is.null(schema) is not TRUE{code}
> This is what we want to happen:
> {code:r}> Expression$scalar("foo")$type()
> Utf8
> string
> > Expression$scalar(42L)$type()
> Int32
> int32{code}
> This is simple to solve; we just need to set {{schema}} to an empty schema 
> for all scalar expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-13117) [R] Retain schema in new Expressions

2021-06-25 Thread Ian Cook (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook resolved ARROW-13117.
--
Resolution: Fixed

Issue resolved by pull request 10563
[https://github.com/apache/arrow/pull/10563]

> [R] Retain schema in new Expressions
> 
>
> Key: ARROW-13117
> URL: https://issues.apache.org/jira/browse/ARROW-13117
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When a new Expression is created, {{schema}} should be retained from the 
> expression(s) it was created from. That way, the {{type()}} and {{type_id()}} 
> methods of the new Expression will work. For example, currently this happens:
> {code:r}
> > x <- Expression$field_ref("x")
> > x$schema <- Schema$create(x = int32())
> > 
> > y <- Expression$field_ref("y")
> > y$schema <- Schema$create(y = int32())
> > 
> > Expression$create("add_checked", x, y)$type()
> Error: !is.null(schema) is not TRUE {code}
> This is what we want to happen:
> {code:r}
> > Expression$create("add_checked", x, y)$type()
> Int32
> int32
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Weston Pace (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369707#comment-17369707
 ] 

Weston Pace commented on ARROW-13187:
-

Also, it seems this does not happen when repeatedly reading in a parquet file.  
So maybe it isn't in the Arrow->Python code or maybe it's particular to the way 
the CSV reader is creating the table.

> [c++][python] Possibly memory not deallocated when reading in CSV
> -
>
> Key: ARROW-13187
> URL: https://issues.apache.org/jira/browse/ARROW-13187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 4.0.1
>Reporter: Simon
>Priority: Minor
>
> When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
> the read-in table variable is not freed (or not fast enough). I'm unsure if 
> this is because of pyarrow or because of the way pyarrow memory allocation 
> interacts with Python memory allocation. I encountered it when processing 
> many large CSVs sequentially.
> When I run the following piece of code, the RAM memory usage increases quite 
> rapidly until it runs out of memory.
> {code:python}
> import pyarrow as pa
> import pyarrow.csv
> # Generate some CSV file to read in
> print("Generating CSV")
> with open("example.csv", "w+") as f_out:
> for i in range(0, 1000):
> f_out.write("123456789,abc def ghi jkl\n")
> def read_in_the_csv():
> table = pa.csv.read_csv("example.csv")
> print(table)  # Not strictly necessary to replicate bug, table can also 
> be an unused variable
> # This will free up the memory, as a workaround:
> # table = table.slice(0, 0)
> # Read in the CSV many times
> print("Reading in a CSV many times")
> for j in range(10):
> read_in_the_csv()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Weston Pace (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-13187:

Component/s: C++

> [c++][python] Possibly memory not deallocated when reading in CSV
> -
>
> Key: ARROW-13187
> URL: https://issues.apache.org/jira/browse/ARROW-13187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 4.0.1
>Reporter: Simon
>Priority: Minor
>
> When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
> the read-in table variable is not freed (or not fast enough). I'm unsure if 
> this is because of pyarrow or because of the way pyarrow memory allocation 
> interacts with Python memory allocation. I encountered it when processing 
> many large CSVs sequentially.
> When I run the following piece of code, the RAM memory usage increases quite 
> rapidly until it runs out of memory.
> {code:python}
> import pyarrow as pa
> import pyarrow.csv
> # Generate some CSV file to read in
> print("Generating CSV")
> with open("example.csv", "w+") as f_out:
> for i in range(0, 1000):
> f_out.write("123456789,abc def ghi jkl\n")
> def read_in_the_csv():
> table = pa.csv.read_csv("example.csv")
> print(table)  # Not strictly necessary to replicate bug, table can also 
> be an unused variable
> # This will free up the memory, as a workaround:
> # table = table.slice(0, 0)
> # Read in the CSV many times
> print("Reading in a CSV many times")
> for j in range(10):
> read_in_the_csv()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Weston Pace (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-13187:

Summary: [c++][python] Possibly memory not deallocated when reading in CSV  
(was: Possibly memory not deallocated when reading in CSV)

> [c++][python] Possibly memory not deallocated when reading in CSV
> -
>
> Key: ARROW-13187
> URL: https://issues.apache.org/jira/browse/ARROW-13187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 4.0.1
>Reporter: Simon
>Priority: Minor
>
> When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
> the read-in table variable is not freed (or not fast enough). I'm unsure if 
> this is because of pyarrow or because of the way pyarrow memory allocation 
> interacts with Python memory allocation. I encountered it when processing 
> many large CSVs sequentially.
> When I run the following piece of code, the RAM memory usage increases quite 
> rapidly until it runs out of memory.
> {code:python}
> import pyarrow as pa
> import pyarrow.csv
> # Generate some CSV file to read in
> print("Generating CSV")
> with open("example.csv", "w+") as f_out:
> for i in range(0, 1000):
> f_out.write("123456789,abc def ghi jkl\n")
> def read_in_the_csv():
> table = pa.csv.read_csv("example.csv")
> print(table)  # Not strictly necessary to replicate bug, table can also 
> be an unused variable
> # This will free up the memory, as a workaround:
> # table = table.slice(0, 0)
> # Read in the CSV many times
> print("Reading in a CSV many times")
> for j in range(10):
> read_in_the_csv()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13189) [R] Should we be handling row-level metadata at all?

2021-06-25 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369703#comment-17369703
 ] 

Neal Richardson commented on ARROW-13189:
-

I think we should ignore row-level metadata in general, and (here lies the 
bigger task) provide an interface (via S3 methods, most likely) for people to 
define custom boxing/unboxing of custom data types where our general metadata 
handling is insufficient or suboptimal. This is essentially allowing R 
developers to define Extension Types.

> [R] Should we be handling row-level metadata at all?
> 
>
> Key: ARROW-13189
> URL: https://issues.apache.org/jira/browse/ARROW-13189
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 3.0.0, 4.0.0, 4.0.1
>Reporter: Jonathan Keane
>Priority: Major
>
> In order to support things like SF columns, we have added code that handles 
> row-level metadata (https://github.com/apache/arrow/pull/8549 and 
> https://github.com/apache/arrow/pull/9182).
> These work just fine in a single table or single parquet file circumstance, 
> but when using a dataset (even without filtering!) this can produce some 
> surprising (and wrong) results (see reprex below).
> There is already some work underway to make it easier to convert the 
> row-element-level attributes to a struct + store it in the column in the 
> ARROW-12542 work, but that's still a bit off. But even once that's done, 
> should we disable this totally? Stop or ignore+warn that with datasets 
> row-level metadata isn't applied (since there's no way for us to get the 
> ordering right)? Something else?
> {code:r}
> library(arrow)
> df <- tibble::tibble(
>   part = rep(1:2, 13),
>   let = letters
> )
> df$embedded_attr <- lapply(seq_len(nrow(df)), function(i) {
>   value <- "nothing"
>   attributes(value) <- list(letter = df[[i, "let"]])
>   value
> })
> df_from_tab <- as.data.frame(Table$create(df))
> # this should be (and is) "b"
> attributes(df_from_tab[df_from_tab$let == "b", "embedded_attr"][[1]][[1]])
> #> $letter
> #> [1] "b"
> # the dfs are the same
> waldo::compare(df, df_from_tab)
> #> ✓ No differences
> # now via dataset
> dir <- "ds-dir"
> write_dataset(df, path = dir, partitioning = "part")
> ds <- open_dataset(dir)
> df_from_ds <- dplyr::collect(ds)
> # this should be (and is not) "b"
> attributes(df_from_ds[df_from_ds$let == "b", "embedded_attr"][[1]][[1]])
> #> $letter
> #> [1] "n"
> # Even controlling for order, the dfs are not the same
> waldo::compare(dplyr::arrange(df, let), dplyr::arrange(df_from_ds, let))
> #> `names(old)`: "part" "let" "embedded_attr"   
> #> `names(new)`:"let" "embedded_attr" "part"
> #> 
> #> `attr(old$embedded_attr[[2]], 'letter')`: "b"
> #> `attr(new$embedded_attr[[2]], 'letter')`: "n"
> #> 
> #> `attr(old$embedded_attr[[3]], 'letter')`: "c"
> #> `attr(new$embedded_attr[[3]], 'letter')`: "b"
> #> 
> #> `attr(old$embedded_attr[[4]], 'letter')`: "d"
> #> `attr(new$embedded_attr[[4]], 'letter')`: "o"
> #> 
> #> `attr(old$embedded_attr[[5]], 'letter')`: "e"
> #> `attr(new$embedded_attr[[5]], 'letter')`: "c"
> #> 
> #> `attr(old$embedded_attr[[6]], 'letter')`: "f"
> #> `attr(new$embedded_attr[[6]], 'letter')`: "p"
> #> 
> #> `attr(old$embedded_attr[[7]], 'letter')`: "g"
> #> `attr(new$embedded_attr[[7]], 'letter')`: "d"
> #> 
> #> `attr(old$embedded_attr[[8]], 'letter')`: "h"
> #> `attr(new$embedded_attr[[8]], 'letter')`: "q"
> #> 
> #> `attr(old$embedded_attr[[9]], 'letter')`: "i"
> #> `attr(new$embedded_attr[[9]], 'letter')`: "e"
> #> 
> #> `attr(old$embedded_attr[[10]], 'letter')`: "j"
> #> `attr(new$embedded_attr[[10]], 'letter')`: "r"
> #> 
> #> And 15 more differences ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13187) Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Weston Pace (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369702#comment-17369702
 ] 

Weston Pace commented on ARROW-13187:
-

Great reproduction, thank you.  I can reproduce this on 4.0.0 but not on 3.0.0. 
 A few observations so far:

pa.total_allocated_bytes is increasing so it is not a dynamic allocator blowup 
issue.

"del table" prevents the out-of-ram (same as the table.slice above).

"gc.collect" prevents the out-of-ram

 

Those workarounds shouldn't be necessary however.  When read_in_the_csv exits 
the table is no longer needed, it's refcount should decrease by 1, and it 
should be eligible for garbage collection.  Combined with the fact that this 
doesn't occur on 3.0.0 (both environments are using python 3.8 although 3.8.6 
vs 3.8.8 but I doubt it's a python change) I think this means that a circular 
reference was introduced in the Arrow->Python code between 3.0.0 and 4.0.0.

> Possibly memory not deallocated when reading in CSV
> ---
>
> Key: ARROW-13187
> URL: https://issues.apache.org/jira/browse/ARROW-13187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 4.0.1
>Reporter: Simon
>Priority: Minor
>
> When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
> the read-in table variable is not freed (or not fast enough). I'm unsure if 
> this is because of pyarrow or because of the way pyarrow memory allocation 
> interacts with Python memory allocation. I encountered it when processing 
> many large CSVs sequentially.
> When I run the following piece of code, the RAM memory usage increases quite 
> rapidly until it runs out of memory.
> {code:python}
> import pyarrow as pa
> import pyarrow.csv
> # Generate some CSV file to read in
> print("Generating CSV")
> with open("example.csv", "w+") as f_out:
> for i in range(0, 1000):
> f_out.write("123456789,abc def ghi jkl\n")
> def read_in_the_csv():
> table = pa.csv.read_csv("example.csv")
> print(table)  # Not strictly necessary to replicate bug, table can also 
> be an unused variable
> # This will free up the memory, as a workaround:
> # table = table.slice(0, 0)
> # Read in the CSV many times
> print("Reading in a CSV many times")
> for j in range(10):
> read_in_the_csv()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13174) [C++][Compute] Add strftime kernel

2021-06-25 Thread Rok Mihevc (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-13174:
---
Labels: timestamp  (was: )

> [C++][Compute] Add strftime kernel
> --
>
> Key: ARROW-13174
> URL: https://issues.apache.org/jira/browse/ARROW-13174
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: timestamp
>
> To convert timestamps to a string representation with an arbitrary format we 
> require a strftime kernel (the inverse operation of the {{strptime}} kernel 
> we already have).
> See [comments 
> here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13174) [C++][Compute] Add strftime kernel

2021-06-25 Thread Rok Mihevc (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369653#comment-17369653
 ] 

Rok Mihevc commented on ARROW-13174:


This should also implement TemporalStrftimeOptions with format and locale 
properties.

> [C++][Compute] Add strftime kernel
> --
>
> Key: ARROW-13174
> URL: https://issues.apache.org/jira/browse/ARROW-13174
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> To convert timestamps to a string representation with an arbitrary format we 
> require a strftime kernel (the inverse operation of the {{strptime}} kernel 
> we already have).
> See [comments 
> here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13186) [R] Implement type determination more cleanly

2021-06-25 Thread Ian Cook (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369622#comment-17369622
 ] 

Ian Cook commented on ARROW-13186:
--

Nice, thanks [~npr].

Yes, using {{eval_select}} across the board is ARROW-12105. I hope to get that 
done for 5.0.0.

> [R] Implement type determination more cleanly
> -
>
> Key: ARROW-13186
> URL: https://issues.apache.org/jira/browse/ARROW-13186
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 5.0.0
>Reporter: Ian Cook
>Priority: Major
>
> In the R package, there are several improvements in data type determination 
> in the 5.0.0 release. The implementation of these improvements used a kludge: 
> They made it possible to store a {{Schema}} in an {{Expression}} object in 
> the R package; when set, this {{Schema}} is retained in derivative 
> {{Expression}} objects. This was the most convenient way to make the 
> {{Schema}} available for passing it to the {{type_id()}} method, which 
> requires it. But this introduces a deviation of the R package's 
> {{Expression}} object from the C++ library's {{Expression}} object, and it 
> makes our type determination functions work differently than the other R 
> functions in {{nse_funcs}}.
> The Jira issues in which these somewhat kludgy improvements were made are:
>  * allowing a schema to be stored in the {{Expression}} object, and 
> implementing type determination functions in a way that uses that schema 
> (ARROW-12781)
>  * retaining a schema in derivative {{Expression}} objects (ARROW-13117)
>  * setting an empty schema in scalar literal {{Expression}} objects 
> (ARROW-13119)
> From the perspective of the R package, an ideal way to implement type 
> determination functions would be to call a {{type_id}} kernel through the 
> {{call_function}} interface, but this was rejected in ARROW-13167. Consider 
> other ways that we might improve this implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13173) [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally

2021-06-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13173:
---
Labels: pull-request-available  (was: )

> [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally 
> -
>
> Key: ARROW-13173
> URL: https://issues.apache.org/jira/browse/ARROW-13173
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 4.0.0, 4.0.1
>Reporter: Yibo Cai
>Assignee: Weston Pace
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Observed one test case failure from Travis CI arm64 job.
> https://travis-ci.com/github/apache/arrow/jobs/518630381#L2271
> {{TestAsyncUtil.ReadaheadFailed}} asserted at 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_generator_test.cc#L1131
> Looks _SleepABit()_ cannot guarantee that _finished_ will be set in time, 
> especially on busy CI hosts where many jobs share one machine.
> cc [~westonpace]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13189) [R] Should we be handling row-level metadata at all?

2021-06-25 Thread Jonathan Keane (Jira)

Jonathan Keane created ARROW-13189:
--

 Summary: [R] Should we be handling row-level metadata at all?
 Key: ARROW-13189
 URL: https://issues.apache.org/jira/browse/ARROW-13189
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 4.0.1, 4.0.0, 3.0.0
Reporter: Jonathan Keane


In order to support things like SF columns, we have added code that handles 
row-level metadata (https://github.com/apache/arrow/pull/8549 and 
https://github.com/apache/arrow/pull/9182).

These work just fine in a single table or single parquet file circumstance, but 
when using a dataset (even without filtering!) this can produce some surprising 
(and wrong) results (see reprex below).

There is already some work underway to make it easier to convert the 
row-element-level attributes to a struct + store it in the column in the 
ARROW-12542 work, but that's still a bit off. But even once that's done, should 
we disable this totally? Stop or ignore+warn that with datasets row-level 
metadata isn't applied (since there's no way for us to get the ordering right)? 
Something else?

{code:r}
library(arrow)

df <- tibble::tibble(
  part = rep(1:2, 13),
  let = letters
)

df$embedded_attr <- lapply(seq_len(nrow(df)), function(i) {
  value <- "nothing"
  attributes(value) <- list(letter = df[[i, "let"]])
  value
})
df_from_tab <- as.data.frame(Table$create(df))

# this should be (and is) "b"
attributes(df_from_tab[df_from_tab$let == "b", "embedded_attr"][[1]][[1]])
#> $letter
#> [1] "b"

# the dfs are the same
waldo::compare(df, df_from_tab)
#> ✓ No differences

# now via dataset
dir <- "ds-dir"
write_dataset(df, path = dir, partitioning = "part")

ds <- open_dataset(dir)
df_from_ds <- dplyr::collect(ds)

# this should be (and is not) "b"
attributes(df_from_ds[df_from_ds$let == "b", "embedded_attr"][[1]][[1]])
#> $letter
#> [1] "n"

# Even controlling for order, the dfs are not the same
waldo::compare(dplyr::arrange(df, let), dplyr::arrange(df_from_ds, let))
#> `names(old)`: "part" "let" "embedded_attr"   
#> `names(new)`:"let" "embedded_attr" "part"
#> 
#> `attr(old$embedded_attr[[2]], 'letter')`: "b"
#> `attr(new$embedded_attr[[2]], 'letter')`: "n"
#> 
#> `attr(old$embedded_attr[[3]], 'letter')`: "c"
#> `attr(new$embedded_attr[[3]], 'letter')`: "b"
#> 
#> `attr(old$embedded_attr[[4]], 'letter')`: "d"
#> `attr(new$embedded_attr[[4]], 'letter')`: "o"
#> 
#> `attr(old$embedded_attr[[5]], 'letter')`: "e"
#> `attr(new$embedded_attr[[5]], 'letter')`: "c"
#> 
#> `attr(old$embedded_attr[[6]], 'letter')`: "f"
#> `attr(new$embedded_attr[[6]], 'letter')`: "p"
#> 
#> `attr(old$embedded_attr[[7]], 'letter')`: "g"
#> `attr(new$embedded_attr[[7]], 'letter')`: "d"
#> 
#> `attr(old$embedded_attr[[8]], 'letter')`: "h"
#> `attr(new$embedded_attr[[8]], 'letter')`: "q"
#> 
#> `attr(old$embedded_attr[[9]], 'letter')`: "i"
#> `attr(new$embedded_attr[[9]], 'letter')`: "e"
#> 
#> `attr(old$embedded_attr[[10]], 'letter')`: "j"
#> `attr(new$embedded_attr[[10]], 'letter')`: "r"
#> 
#> And 15 more differences ...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries

2021-06-25 Thread Jira



[ 
https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369614#comment-17369614
 ] 

Mauricio 'Pachá' Vargas Sepúlveda commented on ARROW-13188:
---

right, closing now

> [R] [C++] Implement substr/str_sub for dplyr queries
> 
>
> Key: ARROW-13188
> URL: https://issues.apache.org/jira/browse/ARROW-13188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 4.0.1
>Reporter: Mauricio 'Pachá' Vargas Sepúlveda
>Priority: Minor
>
> I would be highly desirable to be able to use (base) substr and/or (stringr) 
> str_sub in dplyr queries, like
> {code:r}
> library(arrow)
> library(dplyr)
> library(stringr)
> # get animal products, year 20919
> open_dataset(
>   "../cepii-datasets-arrow/parquet/baci_hs92",
>   partitioning = c("year", "reporter_iso")
> ) %>% 
>   filter(
> year == 2019,
> str_sub(product_code, 1, 2) == "01"
>   ) %>% 
>   collect()
> Error: Filter expression not supported for Arrow Datasets: 
> str_sub(product_code, 1, 2) == "01"
> Call collect() first to pull data into R.
> {code}
> Of course, this needs implementation, but similar to ARROW-13107, points to 
> an easier integration with dplyr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12992) [R] bindings for substr

2021-06-25 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-12992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mauricio 'Pachá' Vargas Sepúlveda updated ARROW-12992:
--
Description: 
Followup to ARROW-10557, which implemented the C++

current state:
{code:r}
library(arrow)
library(dplyr)
library(stringr)

# get animal products, year 20919
open_dataset(
  "../cepii-datasets-arrow/parquet/baci_hs92",
  partitioning = c("year", "reporter_iso")
) %>% 
  filter(
year == 2019,
str_sub(product_code, 1, 2) == "01"
  ) %>% 
  collect()

Error: Filter expression not supported for Arrow Datasets: 
str_sub(product_code, 1, 2) == "01"
Call collect() first to pull data into R.
{code}



  was:Followup to ARROW-10557, which implemented the C++


> [R] bindings for substr
> ---
>
> Key: ARROW-12992
> URL: https://issues.apache.org/jira/browse/ARROW-12992
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 5.0.0
>
>
> Followup to ARROW-10557, which implemented the C++
> current state:
> {code:r}
> library(arrow)
> library(dplyr)
> library(stringr)
> # get animal products, year 20919
> open_dataset(
>   "../cepii-datasets-arrow/parquet/baci_hs92",
>   partitioning = c("year", "reporter_iso")
> ) %>% 
>   filter(
> year == 2019,
> str_sub(product_code, 1, 2) == "01"
>   ) %>% 
>   collect()
> Error: Filter expression not supported for Arrow Datasets: 
> str_sub(product_code, 1, 2) == "01"
> Call collect() first to pull data into R.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries

2021-06-25 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mauricio 'Pachá' Vargas Sepúlveda closed ARROW-13188.
-
Resolution: Duplicate

> [R] [C++] Implement substr/str_sub for dplyr queries
> 
>
> Key: ARROW-13188
> URL: https://issues.apache.org/jira/browse/ARROW-13188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 4.0.1
>Reporter: Mauricio 'Pachá' Vargas Sepúlveda
>Priority: Minor
>
> I would be highly desirable to be able to use (base) substr and/or (stringr) 
> str_sub in dplyr queries, like
> {code:r}
> library(arrow)
> library(dplyr)
> library(stringr)
> # get animal products, year 20919
> open_dataset(
>   "../cepii-datasets-arrow/parquet/baci_hs92",
>   partitioning = c("year", "reporter_iso")
> ) %>% 
>   filter(
> year == 2019,
> str_sub(product_code, 1, 2) == "01"
>   ) %>% 
>   collect()
> Error: Filter expression not supported for Arrow Datasets: 
> str_sub(product_code, 1, 2) == "01"
> Call collect() first to pull data into R.
> {code}
> Of course, this needs implementation, but similar to ARROW-13107, points to 
> an easier integration with dplyr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries

2021-06-25 Thread Ian Cook (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369607#comment-17369607
 ] 

Ian Cook commented on ARROW-13188:
--

Dup of ARROW-12992?

> [R] [C++] Implement substr/str_sub for dplyr queries
> 
>
> Key: ARROW-13188
> URL: https://issues.apache.org/jira/browse/ARROW-13188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 4.0.1
>Reporter: Mauricio 'Pachá' Vargas Sepúlveda
>Priority: Minor
>
> I would be highly desirable to be able to use (base) substr and/or (stringr) 
> str_sub in dplyr queries, like
> {code:r}
> library(arrow)
> library(dplyr)
> library(stringr)
> # get animal products, year 20919
> open_dataset(
>   "../cepii-datasets-arrow/parquet/baci_hs92",
>   partitioning = c("year", "reporter_iso")
> ) %>% 
>   filter(
> year == 2019,
> str_sub(product_code, 1, 2) == "01"
>   ) %>% 
>   collect()
> Error: Filter expression not supported for Arrow Datasets: 
> str_sub(product_code, 1, 2) == "01"
> Call collect() first to pull data into R.
> {code}
> Of course, this needs implementation, but similar to ARROW-13107, points to 
> an easier integration with dplyr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13186) [R] Implement type determination more cleanly

2021-06-25 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369591#comment-17369591
 ] 

Neal Richardson commented on ARROW-13186:
-

I did some experimenting and got something that works for the 
arrow_mask/arrow_eval code paths, but any paths that use 
tidyselect::eval_select (currently only relocate but presumably others will be 
added) need slightly different handling and I didn't get the chance to work out 
a solution there yet.

The idea is that we stick the schema as a "data pronoun" like thing in the data 
mask, so that any functions called inside arrow_eval() can call up and find it. 

{code}
diff --git a/r/R/dplyr-eval.R b/r/R/dplyr-eval.R
index de68d2f2c..eda40dc23 100644
--- a/r/R/dplyr-eval.R
+++ b/r/R/dplyr-eval.R
@@ -86,9 +86,6 @@ arrow_mask <- function(.data) {
 f_env[[f]] <- fail
   }
 
-  # Assign the schema to the expressions
-  map(.data$selected_columns, ~(.$schema <- .data$.data$schema))
-
   # Add the column references and make the mask
   out <- new_data_mask(
 new_environment(.data$selected_columns, parent = f_env),
@@ -98,5 +95,18 @@ arrow_mask <- function(.data) {
   # TODO: figure out what rlang::as_data_pronoun does/why we should use it
   # (because if we do we get `Error: Can't modify the data pronoun` in 
mutate())
   out$.data <- .data$selected_columns
+  out$.schema <- .data$.data$schema
   out
 }
+
+arrow_eval_schema <- function() {
+  n <- 1
+  env <- parent.frame(n)
+  while(!identical(env, .GlobalEnv)) {
+if (".schema" %in% ls(env, all.names = TRUE)) {
+  return(get(".schema", env))
+}
+n <- n + 1
+env <- parent.frame(n)
+  }
+}
{code}

Then each of the is* functions calls arrow_eval_schema() to get it. 

The benefit of something like this is that we avoid the cost of 
tracking/merging schemas when building expressions and only have to grab it 
when we need it (which is rarely since none of the other compute kernels 
require it).

> [R] Implement type determination more cleanly
> -
>
> Key: ARROW-13186
> URL: https://issues.apache.org/jira/browse/ARROW-13186
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 5.0.0
>Reporter: Ian Cook
>Priority: Major
>
> In the R package, there are several improvements in data type determination 
> in the 5.0.0 release. The implementation of these improvements used a kludge: 
> They made it possible to store a {{Schema}} in an {{Expression}} object in 
> the R package; when set, this {{Schema}} is retained in derivative 
> {{Expression}} objects. This was the most convenient way to make the 
> {{Schema}} available for passing it to the {{type_id()}} method, which 
> requires it. But this introduces a deviation of the R package's 
> {{Expression}} object from the C++ library's {{Expression}} object, and it 
> makes our type determination functions work differently than the other R 
> functions in {{nse_funcs}}.
> The Jira issues in which these somewhat kludgy improvements were made are:
>  * allowing a schema to be stored in the {{Expression}} object, and 
> implementing type determination functions in a way that uses that schema 
> (ARROW-12781)
>  * retaining a schema in derivative {{Expression}} objects (ARROW-13117)
>  * setting an empty schema in scalar literal {{Expression}} objects 
> (ARROW-13119)
> From the perspective of the R package, an ideal way to implement type 
> determination functions would be to call a {{type_id}} kernel through the 
> {{call_function}} interface, but this was rejected in ARROW-13167. Consider 
> other ways that we might improve this implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries

2021-06-25 Thread Jira



 [ 
https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mauricio 'Pachá' Vargas Sepúlveda updated ARROW-13188:
--
Summary: [R] [C++] Implement substr/str_sub for dplyr queries  (was: [R] 
[C++] Implement SQL-alike distinct() for dplyr queries)

> [R] [C++] Implement substr/str_sub for dplyr queries
> 
>
> Key: ARROW-13188
> URL: https://issues.apache.org/jira/browse/ARROW-13188
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, R
>Affects Versions: 4.0.1
>Reporter: Mauricio 'Pachá' Vargas Sepúlveda
>Priority: Minor
>
> I would be highly desirable to be able to use (base) substr and/or (stringr) 
> str_sub in dplyr queries, like
> {code:r}
> library(arrow)
> library(dplyr)
> library(stringr)
> # get animal products, year 20919
> open_dataset(
>   "../cepii-datasets-arrow/parquet/baci_hs92",
>   partitioning = c("year", "reporter_iso")
> ) %>% 
>   filter(
> year == 2019,
> str_sub(product_code, 1, 2) == "01"
>   ) %>% 
>   collect()
> Error: Filter expression not supported for Arrow Datasets: 
> str_sub(product_code, 1, 2) == "01"
> Call collect() first to pull data into R.
> {code}
> Of course, this needs implementation, but similar to ARROW-13107, points to 
> an easier integration with dplyr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13188) [R] [C++] Implement SQL-alike distinct() for dplyr queries

2021-06-25 Thread Jira

Mauricio 'Pachá' Vargas Sepúlveda created ARROW-13188:
-

 Summary: [R] [C++] Implement SQL-alike distinct() for dplyr queries
 Key: ARROW-13188
 URL: https://issues.apache.org/jira/browse/ARROW-13188
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, R
Affects Versions: 4.0.1
Reporter: Mauricio 'Pachá' Vargas Sepúlveda


I would be highly desirable to be able to use (base) substr and/or (stringr) 
str_sub in dplyr queries, like

{code:r}
library(arrow)
library(dplyr)
library(stringr)

# get animal products, year 20919
open_dataset(
  "../cepii-datasets-arrow/parquet/baci_hs92",
  partitioning = c("year", "reporter_iso")
) %>% 
  filter(
year == 2019,
str_sub(product_code, 1, 2) == "01"
  ) %>% 
  collect()

Error: Filter expression not supported for Arrow Datasets: 
str_sub(product_code, 1, 2) == "01"
Call collect() first to pull data into R.
{code}

Of course, this needs implementation, but similar to ARROW-13107, points to an 
easier integration with dplyr.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-13173) [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally

2021-06-25 Thread Weston Pace (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace reassigned ARROW-13173:
---

Assignee: Weston Pace

> [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally 
> -
>
> Key: ARROW-13173
> URL: https://issues.apache.org/jira/browse/ARROW-13173
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 4.0.0, 4.0.1
>Reporter: Yibo Cai
>Assignee: Weston Pace
>Priority: Major
> Fix For: 5.0.0
>
>
> Observed one test case failure from Travis CI arm64 job.
> https://travis-ci.com/github/apache/arrow/jobs/518630381#L2271
> {{TestAsyncUtil.ReadaheadFailed}} asserted at 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_generator_test.cc#L1131
> Looks _SleepABit()_ cannot guarantee that _finished_ will be set in time, 
> especially on busy CI hosts where many jobs share one machine.
> cc [~westonpace]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13187) Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Simon (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon updated ARROW-13187:
--
Description: 
When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
the read-in table variable is not freed (or not fast enough). I'm unsure if 
this is because of pyarrow or because of the way pyarrow memory allocation 
interacts with Python memory allocation. I encountered it when processing many 
large CSVs sequentially.

When I run the following piece of code, the RAM memory usage increases quite 
rapidly until it runs out of memory.

{code:python}
import pyarrow as pa
import pyarrow.csv

# Generate some CSV file to read in
print("Generating CSV")
with open("example.csv", "w+") as f_out:
for i in range(0, 1000):
f_out.write("123456789,abc def ghi jkl\n")


def read_in_the_csv():
table = pa.csv.read_csv("example.csv")
print(table)  # Not strictly necessary to replicate bug, table can also be 
an unused variable
# This will free up the memory, as a workaround:
# table = table.slice(0, 0)


# Read in the CSV many times
print("Reading in a CSV many times")
for j in range(10):
read_in_the_csv()
{code}


  was:
When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
the read-in table variable is not freed (or not fast enough). I'm unsure if 
this is because of pyarrow or because of the way pyarrow memory allocation 
interacts with Python memory allocation. I encountered it when processing many 
large CSVs sequentially.

When I run the following piece of code, the RAM memory usage increases quite 
rapidly until it runs out of memory.

{code:python}
import pyarrow as pa
import pyarrow.csv

# Generate some CSV file to read in
print("Generating CSV")
with open("example.csv", "w+") as f_out:
for i in range(0, 1000):
f_out.write("123456789,abc def ghi jkl\n")


def read_in_the_csv():
table = pa.csv.read_csv("example.csv")
print(table)  # Not strictly necessary to replicate bug, table can also be 
an unused variable
# This will free up the memory, as a workaround:
# table = table.slice(0, 0)


# Read in the
print("Reading in a CSV many times")
for j in range(10):
read_in_the_csv()
{code}



> Possibly memory not deallocated when reading in CSV
> ---
>
> Key: ARROW-13187
> URL: https://issues.apache.org/jira/browse/ARROW-13187
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 4.0.1
>Reporter: Simon
>Priority: Minor
>
> When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
> the read-in table variable is not freed (or not fast enough). I'm unsure if 
> this is because of pyarrow or because of the way pyarrow memory allocation 
> interacts with Python memory allocation. I encountered it when processing 
> many large CSVs sequentially.
> When I run the following piece of code, the RAM memory usage increases quite 
> rapidly until it runs out of memory.
> {code:python}
> import pyarrow as pa
> import pyarrow.csv
> # Generate some CSV file to read in
> print("Generating CSV")
> with open("example.csv", "w+") as f_out:
> for i in range(0, 1000):
> f_out.write("123456789,abc def ghi jkl\n")
> def read_in_the_csv():
> table = pa.csv.read_csv("example.csv")
> print(table)  # Not strictly necessary to replicate bug, table can also 
> be an unused variable
> # This will free up the memory, as a workaround:
> # table = table.slice(0, 0)
> # Read in the CSV many times
> print("Reading in a CSV many times")
> for j in range(10):
> read_in_the_csv()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13187) Possibly memory not deallocated when reading in CSV

2021-06-25 Thread Simon (Jira)

Simon created ARROW-13187:
-

 Summary: Possibly memory not deallocated when reading in CSV
 Key: ARROW-13187
 URL: https://issues.apache.org/jira/browse/ARROW-13187
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 4.0.1
Reporter: Simon


When one reads in a table from CSV in pyarrow version 4.0.1, it appears that 
the read-in table variable is not freed (or not fast enough). I'm unsure if 
this is because of pyarrow or because of the way pyarrow memory allocation 
interacts with Python memory allocation. I encountered it when processing many 
large CSVs sequentially.

When I run the following piece of code, the RAM memory usage increases quite 
rapidly until it runs out of memory.

{code:python}
import pyarrow as pa
import pyarrow.csv

# Generate some CSV file to read in
print("Generating CSV")
with open("example.csv", "w+") as f_out:
for i in range(0, 1000):
f_out.write("123456789,abc def ghi jkl\n")


def read_in_the_csv():
table = pa.csv.read_csv("example.csv")
print(table)  # Not strictly necessary to replicate bug, table can also be 
an unused variable
# This will free up the memory, as a workaround:
# table = table.slice(0, 0)


# Read in the
print("Reading in a CSV many times")
for j in range(10):
read_in_the_csv()
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12137) [R] New/improved vignette on dplyr features

2021-06-25 Thread Ian Cook (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-12137:
-
Fix Version/s: (was: 5.0.0)
   6.0.0

> [R] New/improved vignette on dplyr features
> ---
>
> Key: ARROW-12137
> URL: https://issues.apache.org/jira/browse/ARROW-12137
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Ian Cook
>Priority: Major
> Fix For: 6.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13151) [Python] Unable to read single child field of struct column from Parquet

2021-06-25 Thread Jim Pivarski (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369582#comment-17369582
 ] 

Jim Pivarski commented on ARROW-13151:
--

Great, thank you! I see now that your calling it a "bug" was commenting on 
Joris's question about whether it ought to be supported, and that's what I was 
responding to.

When this is fixed, it will be a new minimum version of Arrow for us because of 
its importance in our work.

(As a side-note, if you do change the ugly "list.item" access, we'll have to 
adjust because of course we're generating column names to request them like 
that. So if that changes, we'll definitely need to pin a minimum Arrow version 
because the new names would be incompatible. I'd prefer it not to; and after 
all, it's what's in the Parquet schema. Maybe "synonyms" could hide that 
feature from high-level users, though that complicates the interface.)

> [Python] Unable to read single child field of struct column from Parquet
> 
>
> Key: ARROW-13151
> URL: https://issues.apache.org/jira/browse/ARROW-13151
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Parquet, Python
>Reporter: Angus Hollands
>Priority: Major
>
> Given the following table
> {code:java}
> data = {"root": [[{"addr": {"this": 3, "that": 3}}]]}
> table = pa.Table.from_pydict(data)
> {code}
> reading the nested column leads to an `pyarrow.lib.ArrowInvalid` error:
> {code}
> pq.write_table(table, "/tmp/table.parquet")
> file = pq.ParquetFile("/tmp/table.parquet")
> array = file.read(["root.list.item.addr.that"])
> {code}
> Traceback:
> {code}
> Traceback (most recent call last):
>   File "", line 21, in 
> array = file.read(["root.list.item.addr.that"])
>   File 
> "/home/angus/.mambaforge/envs/awkward/lib/python3.9/site-packages/pyarrow/parquet.py",
>  line 383, in read
> return self.reader.read_all(column_indices=column_indices,
>   File "pyarrow/_parquet.pyx", line 1097, in 
> pyarrow._parquet.ParquetReader.read_all
>   File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: List child array invalid: Invalid: Struct child 
> array #0 does not match type field: struct vs struct int64, this: int64>
> {code}
> It's possible that I don't quite understand this properly - am I doing 
> something wrong?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10998) [C++] Filesystems: detect if URI is passed where a file path is required and raise informative error

2021-06-25 Thread Ian Cook (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-10998:
-
Fix Version/s: (was: 5.0.0)
   6.0.0

> [C++] Filesystems: detect if URI is passed where a file path is required and 
> raise informative error
> 
>
> Key: ARROW-10998
> URL: https://issues.apache.org/jira/browse/ARROW-10998
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Joris Van den Bossche
>Assignee: Ian Cook
>Priority: Major
>  Labels: filesystem
> Fix For: 6.0.0
>
>
> Currently, when passing a URI to a filesystem method (except for 
> {{from_uri}}) or other functions that accept a filesystem object, you can get 
> a rather cryptic error message (eg in this case about "No response body" for 
> S3, in the example below). 
> Ideally, the filesystem object knows its own prefix "scheme", and so can 
> detect if a user is passing a URI instead of file path, and we can provide a 
> nicer error message.
> Example with S3:
> {code:python}
> >>> from pyarrow.fs import S3FileSystem
> >>> fs = S3FileSystem(region="us-east-2")
> >>> fs.get_file_info('s3://ursa-labs-taxi-data/2016/01/')
> ...
> OSError: When getting information for key '/ursa-labs-taxi-data/2016/01' in 
> bucket 's3:': AWS Error [code 100]: No response body.
> >>> import pyarrow.parquet as pq
> >>> table = pq.read_table('s3://ursa-labs-taxi-data/2016/01/data.parquet', 
> >>> filesystem=fs)
> ...
> OSError: When getting information for key 
> '/ursa-labs-taxi-data/2016/01/data.parquet' in bucket 's3:': AWS Error [code 
> 100]: No response body.
> {code}
> With a local filesystem, you actually get a not found file:
> {code: python}
> >>> fs = LocalFileSystem()
> >>> fs.get_file_info("file:///home")
> 
> {code}
> cc [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file

2021-06-25 Thread Gert Hulselmans (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369567#comment-17369567
 ] 

Gert Hulselmans commented on ARROW-10344:
-

Combined the above snippets in a cleaner way:
https://github.com/aertslab/create_cisTarget_databases/commit/dcf70e60e915d2dc6850343960e7a7d3d3d56c41

> [Python]  Get all columns names (or schema) from Feather file, before loading 
> whole Feather file
> 
>
> Key: ARROW-10344
> URL: https://issues.apache.org/jira/browse/ARROW-10344
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Affects Versions: 1.0.1
>Reporter: Gert Hulselmans
>Priority: Major
>
> Is there a way to get all column names (or schema) from a Feather file before 
> loading the full Feather file?
> My Feather files are big (like 100GB) and the names of the columns are 
> different per analysis and can't be hard coded.
> {code:python}
> import pyarrow.feather as feather
> # Code here to check which columns are in the feather file.
> ...
> my_columns = ...
> # Result is pandas.DataFrame
> read_df = feather.read_feather('/path/to/file', columns=my_columns)
> # Result is pyarrow.Table
> read_arrow = feather.read_table('/path/to/file', columns=my_columns)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARROW-12904) [Rust] Unable to load Feather v2 files created by pyarrow and pandas.

2021-06-25 Thread Gert Hulselmans (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gert Hulselmans closed ARROW-12904.
---
Resolution: Information Provided

> [Rust] Unable to load Feather v2 files created by pyarrow and pandas.
> -
>
> Key: ARROW-12904
> URL: https://issues.apache.org/jira/browse/ARROW-12904
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 4.0.1
> Environment: Ubuntu 20.04
>Reporter: Gert Hulselmans
>Assignee: Joris Van den Bossche
>Priority: Major
>
> arrow-rs seems unable to load Feather v2 files created by pyarrow (and 
> pandas), while it can read Feather v2 created by itself.
> More info at:
> [https://github.com/apache/arrow-rs/issues/286]
>  
> Any idea what is missing in the Rust implementation (missing part of the 
> spec?)?
>  
> {code:java}
> More details: in both files, I am getting the following:
> Reading Utf8
> field_node: FieldNode { length: 7, null_count: 0 }
> offset buffer: Buffer { offset: 200, length: 55 }
> offsets: [32, 0, 407708164, 545407072, 8388608, 67108864, 134217728, 
> 201326592]
> values buffer: Buffer { offset: 256, length: 51 }
> offsets[0] != 0 indicates a problem: offsets are expected to start from zero 
> on any array with offsets.
> offsets[i+1] < offsets[i+1] for some i, which indicates a problem: offsets 
> are expected to be monotonically increasing
> I do not have a root cause yet, these are just observations.
> {code}
> https://github.com/apache/arrow-rs/issues/286#issuecomment-839524898
>  
>  In the attachment the following files can be found.
> {{}}
> {code:java}
> test_pandas.feather: Original Feather file
> test_arrow.feather: loading test_pandas.feather with pyarrow and saving with 
> pyarrow: df_pa = pa.feather.read_feather('test_pandas.feather')
> test_polars.feather:  Loading test_pandas.feather with pyarrow and saving 
> with polars (only this one can be read by arrow-rs)
> test_pandas_from_polars.feather: Loading test_polars.feather with polars and 
> using the to_pandas option.
>  
>  
> {code}
>  
> [^test_feather_file.zip]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12904) [Rust] Unable to load Feather v2 files created by pyarrow and pandas.

2021-06-25 Thread Gert Hulselmans (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369563#comment-17369563
 ] 

Gert Hulselmans commented on ARROW-12904:
-

Looks like it was caused by lz4 compression used in Feather file that arrow-rs 
does not detect properly.

> [Rust] Unable to load Feather v2 files created by pyarrow and pandas.
> -
>
> Key: ARROW-12904
> URL: https://issues.apache.org/jira/browse/ARROW-12904
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 4.0.1
> Environment: Ubuntu 20.04
>Reporter: Gert Hulselmans
>Assignee: Joris Van den Bossche
>Priority: Major
>
> arrow-rs seems unable to load Feather v2 files created by pyarrow (and 
> pandas), while it can read Feather v2 created by itself.
> More info at:
> [https://github.com/apache/arrow-rs/issues/286]
>  
> Any idea what is missing in the Rust implementation (missing part of the 
> spec?)?
>  
> {code:java}
> More details: in both files, I am getting the following:
> Reading Utf8
> field_node: FieldNode { length: 7, null_count: 0 }
> offset buffer: Buffer { offset: 200, length: 55 }
> offsets: [32, 0, 407708164, 545407072, 8388608, 67108864, 134217728, 
> 201326592]
> values buffer: Buffer { offset: 256, length: 51 }
> offsets[0] != 0 indicates a problem: offsets are expected to start from zero 
> on any array with offsets.
> offsets[i+1] < offsets[i+1] for some i, which indicates a problem: offsets 
> are expected to be monotonically increasing
> I do not have a root cause yet, these are just observations.
> {code}
> https://github.com/apache/arrow-rs/issues/286#issuecomment-839524898
>  
>  In the attachment the following files can be found.
> {{}}
> {code:java}
> test_pandas.feather: Original Feather file
> test_arrow.feather: loading test_pandas.feather with pyarrow and saving with 
> pyarrow: df_pa = pa.feather.read_feather('test_pandas.feather')
> test_polars.feather:  Loading test_pandas.feather with pyarrow and saving 
> with polars (only this one can be read by arrow-rs)
> test_pandas_from_polars.feather: Loading test_polars.feather with polars and 
> using the to_pandas option.
>  
>  
> {code}
>  
> [^test_feather_file.zip]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13151) [Python] Unable to read single child field of struct column from Parquet

2021-06-25 Thread Micah Kornfield (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369562#comment-17369562
 ] 

Micah Kornfield commented on ARROW-13151:
-

It should be very much supported.  Like I said this is a bug.  It will take 
some tracing to figure out why it is occurring.  

> [Python] Unable to read single child field of struct column from Parquet
> 
>
> Key: ARROW-13151
> URL: https://issues.apache.org/jira/browse/ARROW-13151
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Parquet, Python
>Reporter: Angus Hollands
>Priority: Major
>
> Given the following table
> {code:java}
> data = {"root": [[{"addr": {"this": 3, "that": 3}}]]}
> table = pa.Table.from_pydict(data)
> {code}
> reading the nested column leads to an `pyarrow.lib.ArrowInvalid` error:
> {code}
> pq.write_table(table, "/tmp/table.parquet")
> file = pq.ParquetFile("/tmp/table.parquet")
> array = file.read(["root.list.item.addr.that"])
> {code}
> Traceback:
> {code}
> Traceback (most recent call last):
>   File "", line 21, in 
> array = file.read(["root.list.item.addr.that"])
>   File 
> "/home/angus/.mambaforge/envs/awkward/lib/python3.9/site-packages/pyarrow/parquet.py",
>  line 383, in read
> return self.reader.read_all(column_indices=column_indices,
>   File "pyarrow/_parquet.pyx", line 1097, in 
> pyarrow._parquet.ParquetReader.read_all
>   File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: List child array invalid: Invalid: Struct child 
> array #0 does not match type field: struct vs struct int64, this: int64>
> {code}
> It's possible that I don't quite understand this properly - am I doing 
> something wrong?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13117) [R] Retain schema in new Expressions

2021-06-25 Thread Ian Cook (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369552#comment-17369552
 ] 

Ian Cook commented on ARROW-13117:
--

In recognition of the following...
 * Enabling the data type of an {{Expression}} to be knowable at all times is 
important for enabling broader support for expressions in dplyr verbs.
 * The PR here and the the earlier changes in ARROW-12781 enable that, but in a 
somewhat kludgy.
 * As kludges go, this one is not so bad, and it and would be straightforward 
to implement this more cleanly in the future.
 * At present, there is no clear way to implement this more cleanly, at least 
not without doing a major refactor or compromising its functionality.

... I created ARROW-13186 for future consideration of ways to implement this 
more cleanly, and for now I will merge this PR.

> [R] Retain schema in new Expressions
> 
>
> Key: ARROW-13117
> URL: https://issues.apache.org/jira/browse/ARROW-13117
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When a new Expression is created, {{schema}} should be retained from the 
> expression(s) it was created from. That way, the {{type()}} and {{type_id()}} 
> methods of the new Expression will work. For example, currently this happens:
> {code:r}
> > x <- Expression$field_ref("x")
> > x$schema <- Schema$create(x = int32())
> > 
> > y <- Expression$field_ref("y")
> > y$schema <- Schema$create(y = int32())
> > 
> > Expression$create("add_checked", x, y)$type()
> Error: !is.null(schema) is not TRUE {code}
> This is what we want to happen:
> {code:r}
> > Expression$create("add_checked", x, y)$type()
> Int32
> int32
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13186) [R] Implement type determination more cleanly

2021-06-25 Thread Ian Cook (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-13186:
-
Description: 
In the R package, there are several improvements in data type determination in 
the 5.0.0 release. The implementation of these improvements used a kludge: They 
made it possible to store a {{Schema}} in an {{Expression}} object in the R 
package; when set, this {{Schema}} is retained in derivative {{Expression}} 
objects. This was the most convenient way to make the {{Schema}} available for 
passing it to the {{type_id()}} method, which requires it. But this introduces 
a deviation of the R package's {{Expression}} object from the C++ library's 
{{Expression}} object, and it makes our type determination functions work 
differently than the other R functions in {{nse_funcs}}.

The Jira issues in which these somewhat kludgy improvements were made are:
 * allowing a schema to be stored in the {{Expression}} object, and 
implementing type determination functions in a way that uses that schema 
(ARROW-12781)
 * retaining a schema in derivative {{Expression}} objects (ARROW-13117)
 * setting an empty schema in scalar literal {{Expression}} objects 
(ARROW-13119)

>From the perspective of the R package, an ideal way to implement type 
>determination functions would be to call a {{type_id}} kernel through the 
>{{call_function}} interface, but this was rejected in ARROW-13167. Consider 
>other ways that we might improve this implementation.

  was:
In the R package, there are several improvements in data type determination in 
the 5.0.0 release. The implementation of these improvements used a kludge: They 
made it possible to store a {{Schema}} in an {{Expression}} object in the R 
package; when set, this {{Schema}} is retained in derivative {{Expression}}s. 
This was the most convenient way to make the {{Schema}} available for passing 
it to the {{type_id()}} method, which requires it. But this introduces a 
deviation of the R package's {{Expression}} object from the C++ library's 
{{Expression}} object, and it makes our type determination functions work 
differently than the other R functions in {{nse_funcs}}.

The Jira issues in which these somewhat kludgy improvements were made are:
 * allowing a schema to be stored in the {{Expression}} object, and 
implementing type determination functions in a way that uses that schema 
(ARROW-12781)
 * retaining a schema in derivative {{Expression}} objects (ARROW-13117)
 * setting an empty schema in scalar literal {{Expression}} objects 
(ARROW-13119)

>From the perspective of the R package, an ideal way to implement type 
>determination functions would be to call a {{type_id}} kernel through the 
>{{call_function}} interface, but this was rejected in ARROW-13167. Consider 
>other ways that we might improve this implementation.


> [R] Implement type determination more cleanly
> -
>
> Key: ARROW-13186
> URL: https://issues.apache.org/jira/browse/ARROW-13186
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 5.0.0
>Reporter: Ian Cook
>Priority: Major
>
> In the R package, there are several improvements in data type determination 
> in the 5.0.0 release. The implementation of these improvements used a kludge: 
> They made it possible to store a {{Schema}} in an {{Expression}} object in 
> the R package; when set, this {{Schema}} is retained in derivative 
> {{Expression}} objects. This was the most convenient way to make the 
> {{Schema}} available for passing it to the {{type_id()}} method, which 
> requires it. But this introduces a deviation of the R package's 
> {{Expression}} object from the C++ library's {{Expression}} object, and it 
> makes our type determination functions work differently than the other R 
> functions in {{nse_funcs}}.
> The Jira issues in which these somewhat kludgy improvements were made are:
>  * allowing a schema to be stored in the {{Expression}} object, and 
> implementing type determination functions in a way that uses that schema 
> (ARROW-12781)
>  * retaining a schema in derivative {{Expression}} objects (ARROW-13117)
>  * setting an empty schema in scalar literal {{Expression}} objects 
> (ARROW-13119)
> From the perspective of the R package, an ideal way to implement type 
> determination functions would be to call a {{type_id}} kernel through the 
> {{call_function}} interface, but this was rejected in ARROW-13167. Consider 
> other ways that we might improve this implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13186) [R] Implement type determination more cleanly

2021-06-25 Thread Ian Cook (Jira)

Ian Cook created ARROW-13186:


 Summary: [R] Implement type determination more cleanly
 Key: ARROW-13186
 URL: https://issues.apache.org/jira/browse/ARROW-13186
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Affects Versions: 5.0.0
Reporter: Ian Cook


In the R package, there are several improvements in data type determination in 
the 5.0.0 release. The implementation of these improvements used a kludge: They 
made it possible to store a {{Schema}} in an {{Expression}} object in the R 
package; when set, this {{Schema}} is retained in derivative {{Expression}}s. 
This was the most convenient way to make the {{Schema}} available for passing 
it to the {{type_id()}} method, which requires it. But this introduces a 
deviation of the R package's {{Expression}} object from the C++ library's 
{{Expression}} object, and it makes our type determination functions work 
differently than the other R functions in {{nse_funcs}}.

The Jira issues in which these somewhat kludgy improvements were made are:
 * allowing a schema to be stored in the {{Expression}} object, and 
implementing type determination functions in a way that uses that schema 
(ARROW-12781)
 * retaining a schema in derivative {{Expression}} objects (ARROW-13117)
 * setting an empty schema in scalar literal {{Expression}} objects 
(ARROW-13119)

>From the perspective of the R package, an ideal way to implement type 
>determination functions would be to call a {{type_id}} kernel through the 
>{{call_function}} interface, but this was rejected in ARROW-13167. Consider 
>other ways that we might improve this implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13151) [Python] Unable to read single child field of struct column from Parquet

2021-06-25 Thread Jim Pivarski (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369517#comment-17369517
 ] 

Jim Pivarski commented on ARROW-13151:
--

I hope reading a single field of a struct column is supported! It's an 
important use-case for us.

In particle physics, our data consist of many collision events, each with a 
variable-length number of particles, and each particle is a struct with many 
fields. Often, there's even deeper structure than that, but this is the basic 
structure. These structs are very wide, with as many as a hundred fields, 
because the same dataset is used by 3000 authors, all doing different analyses 
on the same input dataset. Most individual data analysts don't access more than 
10% of these struct fields.

Therefore, it's important to be able to read the data lazily (in interactive 
analysis) or at least selectively (in high-throughput applications). Reading 
and decompressing data are often bottlenecks, so restricting data-loading to 
just the data we use is by itself a 10× improvement. We have a custom file 
format (ROOT) that is designed to provide exactly this selective reading, but 
we've been looking at Parquet as a more cross-language and non-domain-specific 
alternative.

The bug that Angus reported arose in a framework that provides lazy-reading, 
Awkward Array's 
[ak.from_parquet|https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_parquet.html]
 function, which uses pyarrow.parquet.ParquetFile to read the data and convert 
it to Arrow, then converts the Arrow into Awkward Arrays (which are highly 
interchangeable with Arrow Arrays; conversion in both directions is usually 
zero-copy). [This whole 
feature|https://github.com/scikit-hep/awkward-1.0/blob/1ecfc3e29aaf1b79cd7e0e8fa1598452f3827c64/src/awkward/operations/convert.py#L3122-L3959]
 was designed around the idea that you can read individual struct fields, just 
as you can read individual columns. Just today, I found out that's not true, 
even in our basic case that does not trigger errors like Angus's:

>>> pq.write_table(pa.Table.from_pydict(\{"events": [{"muons": [{"pt": 10.5, 
>>> "eta": -1.5, "phi": 0.1}]}]}), "/tmp/testy.parquet")

>>> pq.ParquetFile("/tmp/testy.parquet").read(["events.muons.list.item.pt"])   
>>> # reads all three
pyarrow.Table
events: struct>>
  child 0, muons: list>
    child 0, item: struct
      child 0, eta: double
      child 1, phi: double
      child 2, pt: double
>>> pq.ParquetFile("/tmp/testy.parquet").read(["events.muons.list.item.eta"])   
>>> # reads all three
pyarrow.Table
events: struct>>
  child 0, muons: list>
    child 0, item: struct
      child 0, eta: double
      child 1, phi: double
      child 2, pt: double
>>> pq.ParquetFile("/tmp/testy.parquet").read(["events.muons.list.item.phi"])   
>>> # reads all three
pyarrow.Table
events: struct>>
  child 0, muons: list>
    child 0, item: struct
      child 0, eta: double
      child 1, phi: double
      child 2, pt: double

I hadn't realized that our attempts to read only "muon pt" or only "muon eta" 
were, in fact, reading all muon fields. (In the real datasets, muons have 32 
fields, electrons have 47, taus have 37, jets have 30, photons have 27...)

We could try to rearrange data to something shallower:

{{>>> pq.write_table(pa.Table.from_pydict(\{"muons": [{"pt": 10.5, "eta": -1.5, 
"phi": 0.1}]}), "/tmp/testy.parquet")}}
{{>>> pq.ParquetFile("/tmp/testy.parquet").read(["muons.pt"])}}
{{pyarrow.Table}}
{{muons: struct}}
{{  child 0, pt: double}}
{{>>> pq.ParquetFile("/tmp/testy.parquet").read(["muons.eta"])}}
{{pyarrow.Table}}
{{muons: struct}}
{{  child 0, eta: double}}
{{>>> pq.ParquetFile("/tmp/testy.parquet").read(["muons.phi"])}}
{{pyarrow.Table}}
{{muons: struct}}
{{  child 0, phi: double}}

but that puts a hard-to-predict constraint on data structures. In the above, 
aren't we "reading a single column of a struct column?" (I probably saw this 
behavior and assumed that it would continue to deeper structures, which is how 
I never noticed that they sometimes read all struct fields.)

As a real-world case, here's a dataset that naturally has a structure that 
suffers from over-reading. It's not physics-related: it's a translation of the 
[Million Song Dataset|http://millionsongdataset.com/] into Parquet (side-note: 
it's losslessly 3× smaller than the original HDF5 files because of all the 
variable-length data): s3://pivarski-princeton/millionsongs/ . Lazily loading 
it has odd performance characteristics that I hadn't measured in detail until 
now:

In [1]: import awkward as ak

In [2]: songs = ak.from_parquet("/home/jpivarski/storage/data/million-song-datas
 ...: et/full/millionsongs/millionsongs-A-zstd.parquet", lazy=True)

In [3]: %time songs.analysis.segments.loudness_start
CPU times: user 19.1 ms, sys: 0 ns, total: 19.1 ms
Wall time: 18.8 ms
Out[3]: 

In [4]: %time songs.analysis.segments.loudness_max
CPU

[jira] [Created] (ARROW-13185) [MATLAB] Consider alternatives to placing the MEX binaries within the source tree

2021-06-25 Thread Sarah Gilmore (Jira)

Sarah Gilmore created ARROW-13185:
-

 Summary: [MATLAB] Consider alternatives to placing the MEX 
binaries within the source tree
 Key: ARROW-13185
 URL: https://issues.apache.org/jira/browse/ARROW-13185
 Project: Apache Arrow
  Issue Type: Task
  Components: MATLAB
Reporter: Sarah Gilmore


Since modifying the source directory via the build process is generally 
considered non-optimal, we may want to explore alternative approaches. For 
example, during the build process, we could create a derived source tree (a 
copy of the original source tree) within the build area and place our build 
artifacts within the derived source tree. Then, we could add the derived source 
tree to the MATLAB search path. That's just one option, but there are others we 
could explore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-13125) [R] Throw error when 2+ args passed to desc() in arrange()

2021-06-25 Thread Ian Cook (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook resolved ARROW-13125.
--
Resolution: Fixed

Issue resolved by pull request 10559
[https://github.com/apache/arrow/pull/10559]

> [R] Throw error when 2+ args passed to desc() in arrange()
> --
>
> Key: ARROW-13125
> URL: https://issues.apache.org/jira/browse/ARROW-13125
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 4.0.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently this does not result in an error, but it should:
> {code:r}Table$create(x = 1:3, y = 4:6) %>% arrange(desc(x, y)){code}
> The same problem affects dplyr on R data frames. I opened 
> https://github.com/tidyverse/dplyr/issues/5921 for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Deleted] (ARROW-13175) Technology Trends That Will f9zone Dominate 2017

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13175:
---


> Technology Trends That Will f9zone Dominate 2017
> 
>
> Key: ARROW-13175
> URL: https://issues.apache.org/jira/browse/ARROW-13175
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> Technology has remarkably changed the way we live today, there is no denial 
> to it. Compared with our ancestors, we stand far away from them in using 
> different technologies for our day-to-day works.
> So many technologies are developed in the past couple of years that have 
> revolutionized our lives, and it's impossible to list each of them. Though 
> technology changes fast with time, we can observe the trends in which it 
> changes. Last year, 2016 had bought so many fresh innovative ideas and 
> creations towards automation and integration etc., and this year 2017 is 
> expected to continue the similar kind of trend.
> In this article, we are going to discuss some of the notable trends for this 
> year, which will make us look beyond the horizon.
> Gartner's 2016 Hype Cycle for emerging technologies have identified different 
> technologies that will be trending this year. The cycle illustrates the fact 
> how technology innovations are redefining the relations between the customer 
> and marketer.
> This year, Gartner has identified Blockchains, Connected Homes, Cognitive 
> Expert Advisors, Machine Learning, Software-defined Security etc. as the 
> overarching technology trends, which have the potential of reshaping the 
> business models and offering enterprises the definite route to emerging 
> markets and ecosystems.
> #1. Blockchain
> Popularly known as 'Distributed Ledger Technology' for both financial and 
> non-financial transactions, is one of the mystifying concepts that 
> technologists could only understand to the fullest. Various advancements in 
> blockchain have helped many people and more businesses in 2016, to experience 
> its potential in banking and finance industry. This year, it is anticipated 
> that blockchain technology would go beyond just banking sector, helping the 
> start-ups and established businesses to address the market needs with 
> different application offerings.
> #2. Internet of Things & Smart Home Tech
> With the advent of IoT, we are already eyeing the world of inter-connected 
> things, aren't we? Our dreams of living in smart homes are met to a certain 
> extent in 2016. So, what is stopping us from fulfilling our dreams of living 
> in smart connected homes?
> Well, the fact is that the market is full of abundant individual appliances 
> and apps, but only a little amount of solutions integrate them into a single, 
> inclusive user experience. It is anticipated that 2017 will notice this trend 
> to undergo a big step towards fulfilling our dreams.
> #3. Artificial Intelligence & Machine Learning
> In the recent times, Artificial Intelligence and Machine Learning have taken 
> the entire world by storm with its amazing inventions and innovative 
> technologies. By observing the on-going advancements in this field, it will 
> be no longer an imagination to experience the world where robots and machine 
> will dominate the society.
> Last year, we have witnessed the rise of ML algorithms on almost all major 
> e-commerce portals and its associated mobile apps, which is further expected 
> to spread across on all social networking platforms, dating websites, and 
> matrimonial websites in 2017.
> #4. Software-defined Security
> In 2016, we have observed a significant growth for increased server security. 
> Many organizations have started recognizing the significance of cybersecurity 
> to enable their move of emerging as digital businesses. The growth of 
> cloud-based infrastructure is causing a great demand for managing 
> unstructured data, and moreover, the lack of technical expertise and threat 
> to data security, are the key factors hindering the substantial growth of 
> software-defined security market this year.
> #5. Automation
> Automation will be the mainstay throughout 2017, the coming years will be 
> transformative for IT industry, enabling the automation of human performed 
> tasks. When Machine Learning is combined with automation, the marketers are 
> likely to witness wide business opportunities with enriched market results.
> #6. Augmented Reality (AR) & Virtual Reality (VR)
> AR and VR transform the way users interact with each other and software 
> systems. The year 2016 has experienced path-breaking steps in AR and VR 
> technology.
> With the launch of Oculus Rift, the market had received an overwhelming 
> response from the users, making way to a plethora of VR-based apps and games. 
>

[jira] [Deleted] (ARROW-13179) Impact of Smart Technology on Data friv4school

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13179:
---


> Impact of Smart Technology on Data friv4school
> --
>
> Key: ARROW-13179
> URL: https://issues.apache.org/jira/browse/ARROW-13179
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> With evolving smart technologies, the entire process of rendering data entry 
> services has become way easier. Smart technologies are now helping businesses 
> strategically and economically by generating data from every possible source 
> including mobile phones, industrial equipment, smart accessories and personal 
> computers.
> Data entry services are considered to be "smart" on their responsiveness with 
> respect to the incoming data. Businesses are looking for effective ways to 
> manage data for obtaining better value and supporting their ultimate 
> objectives.
> Smart technologies tend to engage people and various smart devices with the 
> related business, for better processing and collection of data from 
> designated sources. For supporting and coping with the current evolution of 
> such technologies, processes are being constantly renewed.
> There are various smart applications that enhance data analytics processes 
> and make them even better. These include Cloud Computing, Internet of Things, 
> Smart Data and Machine Learning.
> Need of Smart Technology
> Data entry services, when offered with smart technologies provide real-time 
> data processing, thus improving business's economic growth and providing a 
> business-friendly option with efficient data management.
> When looking for a suitable smart app for Nowadays, businesses are striving 
> for more innovative strategies while incorporating these smart apps.
> It eradicates the need of paper documents.
> It provides innovation with a customer-centered approach.
> These technologies are all industry-oriented, providing accurate results
> These are scalable and easy-to-adopt.
> They work even better with unorganized data volumes.
> Collection of Data via Smart Technologies
> Smart technologies assist in collecting and assembling data through:
> Intelligent Capture replacing template-based data extraction with an 
> efficient capturing module and natural language understanding.
> Mobile Data Entry for collecting data on various mobile devices, enabling 
> smart data entry services.
> Robotic Process Automation (RPA) providing the latest smart recognition 
> technology for improved data processing.
> Data Alteration through Smart Technologies
> For better use of these technologies, data entry services and methodologies 
> are continuously being reshaped and revised, allowing organizations to take 
> competitive advantage, along with enhancing cost-efficiency and security of 
> business operations.
> Smart technologies include Artificial Intelligence, Machine Learning, 
> Internet of Things have now replaced manual processes that are more 
> time-consuming, providing lesser room for human errors.
> Let's talk about a few of these technologies:
> Artificial Intelligence and Machine Learning are more responsive and secure 
> when it comes to managing any repetitive task, recognizing various patterns 
> and enhancing the accuracy level.
> For expanding number of data sources and creating a connection between 
> people, internet, devices and businesses, IOT (Internet of Things) is used 
> extensively these days.
> From cloud computing services based on data entry services, businesses can 
> derive benefit and manage the complexity of their data infrastructure.
> Effect of Intelligent Technologies
> Smart technologies are drastically casting a positive impact over data entry 
> services and rendering a friendlier approach, providing benefits in the 
> following ways:
> Better and more composed process, leading to reduction of human errors.
> It has become faster and more efficient with easy management of data in bulk 
> and from different sources like paper forms, scanned images and much more.
> Streamlining the business operations 
> [*friv4school*|https://complextime.com/friv-everything-you-need-to-know-about-it/]
>  and changing the perception of businesses to deal with data management 
> projects.
> Increasing the potential to scale data entry processes and utilize innovative 
> techniques.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Deleted] (ARROW-13180) Teaching With f95 Technology

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13180:
---


> Teaching With f95 Technology
> 
>
> Key: ARROW-13180
> URL: https://issues.apache.org/jira/browse/ARROW-13180
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> Teaching with technology helps to expand student learning by assistant 
> instructional objectives. However, it can be thought-provoking to select the 
> best technology tools while not losing sight of the goal for student 
> learning. An expert can find creative and constructive ways to integrate 
> technology into our class.
> What do we mean by technology?
> The term technology refers to the development of the techniques and tools we 
> use to solve problems or achieve goals. Technology can encompass all kinds of 
> tools from low-tech pencils, paper, a chalkboard to the use of presentation 
> software, or high-tech tablets, online collaboration and conference tools and 
> more. the newest technologies allow us to try things in physical and virtual 
> classrooms that were not possible before.
> How can technology help students?
> Technology can help a student through the following ways:
> 1. Online collaboration tools: Technology has helped the students & 
> instructors to share document online, editing of the document in real time 
> and project them on a screen. This gives the students a collaborative 
> platform in which to brainstorm ideas and document their work using text and 
> pictures.
> 2. Presentation software: This enables the instructor to embed 
> high-resolution photographs, diagrams, videos and sound files to augment the 
> text and verbal lecture content.
> 3. Tablet: Here, tablets can be linked to computers, projectors, and cloud so 
> that students and instructors can communicate through text, drawings, and 
> diagrams.
> 4. Course management tools: This allows instructors to organize all the 
> resources students' needs for the class. the syllabus, assignments, readings, 
> online quizzes.
> 5. Smartphone: These are a quick and easy way to survey students during 
> class. It is a great instant polling which can quickly access students 
> understanding and help instructors to adjust pace and content
> 6. Lecture capture tools: The lecture capture tools allow instructors to 
> record lectures directly from their computer without elaborate or additional 
> classroom equipment. The record lectures at their own pace.
> Advantages of technology integration in the education sphere?
> The teaching strategies based on educational technology can be described as 
> ethical that facilitates the students learning and boost their capacity, 
> productivity, and performance. technology integration inspires positive 
> changes in teaching methods on an international level. The following list of 
> benefit will help in resolving a final conclusion:
> 1. Technology makes teaching easy: technology has power. It helps in the use 
> of projectors and computer presentations to deliver 
> *[f95|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* any type 
> of lesson or instruction and improve the level of comprehension within the 
> class rather than giving theoretical explanations that students cannot 
> understand.
> 2. It facilitates student progress: technology has made teachers rely on 
> platforms and tools that enable you to keep track of individual achievements.
> 3. Education technology is good for the environment: if all schools have 
> dedicated to being using digital textbooks, can you imagine the amount of 
> paper and number of trees that will be saved. students can be instructed to 
> take an online test and submit their papers and homework through email. They 
> can be also encouraged to use readers to read through the literature assigned.
> 4. It has made students enjoy learning: students enjoy learning through their 
> addiction to Facebook, Instagram, dig, and other websites from a very early 
> age. the internet can distract them from the learning process. making 
> learning enjoyable through setting up a private Facebook group for the class 
> and inspire constructive conversations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Deleted] (ARROW-13176) T Is for Technology in Triathlon Training

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13176:
---


> T Is for Technology in Triathlon Training
> -
>
> Key: ARROW-13176
> URL: https://issues.apache.org/jira/browse/ARROW-13176
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> The original triathletes were amazing. Dave Scott and Mark Allen accomplished 
> amazing feats in triathlon long before technology took over the sport. They 
> didn't have metrics like we have today and they certainly didn't have all of 
> the information gathering abilities we have. Yet, they set records and 
> competed valiantly. In fact Mark Allen still holds the marathon record in 
> Kona to this day. Technology is a great friend to triathletes but is does 
> have a downside.
> TECHNOLOGY ITEMS
> So technology has taken over every part of triathlon. One of the most widely 
> researched areas is the area of the triathlon watch. Each and every year 
> there are new watches available for purchase that have ever increasing 
> measurements for the triathlete. My personal favorite is the Garmin 910XT. 
> This watch gives me heart rate, power (with a power meter), pacing (with 
> optional foot pod), speed, cadence (with optional cadence sensor), mileage, 
> yards in swimming, and much more. Each of these measurements aid me in 
> measuring my success or failures in each and every training session and race.
> Technology has been making huge strides in bicycles and wheel sets. The 
> amount of research going into these two items within the world of triathlon 
> is incredible. Each and every year there are new and exciting advances in 
> aerodynamic speed in bicycles and wheel sets. Much of the time these 
> technologies can take on two very different vantage points. This was most 
> evident at the 2016 World Championships in Kona. Diamond Bikes unveiled their 
> Andean bike which fills in all the space in between the front tire and the 
> back tire with a solid piece to make the wind pass by this area for 
> aerodynamics. Another bike debuted at Kona this year with the exact opposite 
> idea. The Ventum bike eliminated the down tube of the bike and made a vacant 
> space in between the front tire and the back tire with only the top tube 
> remaining. These are two very different ideas about aerodynamics. This is one 
> of the amazing things about the advancement of technology and one of the 
> downsides as well.
> Each and every piece of equipment in triathlon is undergoing constant 
> technology advancements. Shoes, wetsuits, socks, nutrition, hats, sunglasses, 
> helmets, racing kits, and anything else you can imagine. This world of 
> technology in triathlon is not near to completion and will continue to push 
> the limits.
> THE UPSIDE TO TECHNOLOGY
> Technology in triathlon is amazing. These new items are exciting and make 
> each and every year different. There are new advancements that help 
> triathletes go faster and longer. These new technologies help even the 
> amateur triathlete to go faster. Just the purchase of new wheels can mean the 
> difference between being on or off the podium. The advancement of shoes has 
> aided many athletes to avoid the injuries that plague so many such as plantar 
> fasciitis. Technology will continue to aid the sport in becoming better and 
> better.
> THE DOWNSIDE TO TECHNOLOGY
> The downside to technology is that the amateur triathlete arrives at their 
> local race already incapable of winning because someone else has the money to 
> buy some of the latest technology. The biggest purchases such as wheel sets 
> and bicycles can be cost prohibitive to the average triathlete and yet 
> *[friv.com|https://complextime.com/friv-everything-you-need-to-know-about-it/]*
>  there are individuals who purchase these items at alarming rates. The 
> amateur triathlete can also feel overwhelmed at what to purchase and what not 
> to purchase. Some items of technology are not worth the extra cost because 
> they do not decrease racing time significantly enough for what they cost. Now 
> that these new technologies have been out awhile, knock-offs have begun to 
> make lower cost items. It will be interesting to watch the flood of these 
> knock-offs into the market and see how that affects the big boys of 
> technology.
> If you are an amateur triathlete shop smart and don't go buy the new gadgets 
> just because they are new. Make sure to invest in items that are going to 
> truly make you faster and not just a gimmick.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Deleted] (ARROW-13183) How Has Technology Changed f95z Our Lives?

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13183:
---


> How Has Technology Changed f95z Our Lives?
> --
>
> Key: ARROW-13183
> URL: https://issues.apache.org/jira/browse/ARROW-13183
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> In the midst of the darkness that engulfed the world, the technology changed 
> the entire life of the human beings. Undoubtedly, we have some negative 
> repercussions of the technology but the positive results of technology have 
> more weight than that of negative. However, it seems a little bit difficult 
> for us to believe that technology has changed our life because it has taken 
> its place slowly and gradually. Therefore, there are innumerable 
> justifications which have been spotlighted below which can prove us that how 
> technology has changed our life in-toto.
> Education System
> Education is a broad field but if we take only a single aspect that is the 
> way of learning then we can come across with great difference that how it has 
> changed our life. For instance, when we were young, it was so hard for us to 
> have a good education along with the variety of examples, and we used to go 
> to buy different expensive books just for the sake of limited topics for 
> making notes and can have good marks in our exams. However, in this 
> technological world, it has become very easy to access different topics on 
> the world of the internet in the very short span of time which also can also 
> be shared with the friends on social media
> Business System
> In the ancient time, it was too difficult to give advertisement of newly 
> launched business with outdated sources such as pasting posters on the wall, 
> distributing the pamphlet to people in a busy market, etc. However, in this 
> contemporary world, technology has made very easy for sharing advertisement 
> of our business at different areas such as on internet sites, on social 
> media, on big LCD's at busy roads, etc. So, this is how our life has changed 
> due to technical assistance and we can easily promote our business in no time.
> Medical Department
> Besides the field of business, Medical Department is at its peak just because 
> of technology. In early life, it was the only Malaria, a fatal disease, 
> because of that many people lost their lives, but now this Malaria which is 
> caused by Plasmodium can easily be treated without any risk. Similarly, this 
> medical *[f95z|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* 
> science is working efficiently and it has diagnosed innumerable ways to live 
> a secure life than earlier. Therefore, technology is the only liable course 
> which has changed our life.
> Communication System
> Last but not least, the communication system has completely changed our life 
> in this technological world and has made a world as a global village. 
> Formerly, people used to send their message through the help of pigeons, then 
> postman but now it has become very easy not to just send the message but also 
> can have access to video call to the one you want to send the message. This 
> is the internet along with smartphones which have made easier for every 
> individual to connect himself with all his distant relatives around the 
> world. Thus, it is the only technology which has made our lives easier than 
> before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Deleted] (ARROW-13182) Innovative Ideas in the Field of f95 zone Technology

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13182:
---


> Innovative Ideas in the Field of f95 zone Technology
> 
>
> Key: ARROW-13182
> URL: https://issues.apache.org/jira/browse/ARROW-13182
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> Innovative ideas in the field of technology have simplified the work and 
> helped our rapid development. These ideas contribute to the creation of 
> innovative technologies over time. In order to create this innovative idea, 
> it is necessary to have the knowledge, which is fundamental in this process.
> Thus we get the scheme: knowledge, idea, technology.
> To date, innovative technologies are traditionally divided into two segments: 
> information technologies (technologies of automated information processing) 
> and communication technologies (technologies for storage and transmission of 
> information). For example, with the help of communication technologies, 
> people can receive and transmit various contents, being in different corners 
> of our world. International relations, including education, business 
> negotiations and much more are now possible faster and more efficiently. If 
> we recall the communication innovations in the field of education, first of 
> all, it should be emphasized that people can enter higher education 
> institutions and study remotely regardless of their location. Furthermore, 
> every qualified pedagogue teaches something new and useful. Communication 
> with representatives of other countries contributes to our self-development. 
> All this eventually promotes the creation of qualified unique staff.
> Information technologies allow:
> - To automate certain labour-intensive operations;
> - Automate and optimize production planning;
> - Optimize individual business processes (for example, customer relations, 
> asset management, document management, management decision-making), taking 
> into account the specifics of various branches of economic activity. 
> Information technology is used for large data processing systems, computing 
> on a personal computer, in science and education, in management, 
> computer-aided design and the creation of systems with artificial 
> intelligence. Information technologies are the modern technological systems 
> of immense strategic importance (political, defence, economic, social and 
> cultural), which led to the formation of a new concept of [*f95 
> zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the world 
> order - "who owns the information, he owns the world."
> The spread of information and communication technologies play an important 
> role in structural changes in all the areas of our life. For someone, it will 
> be difficult to learn these technologies. Workers who will not be able to 
> study will have to give way to the younger generation. Thus we are faced with 
> a problem because, in order to use innovations in technologies and develop 
> it, it is necessary to have a qualified youth. First and foremost there is 
> the question of education. Anyway, only education can create a developed 
> generation that will continue to strive for new knowledge and will meet the 
> requirements of innovative technologies. In addition, I am convinced that 
> innovative ideas in technologies have created a completely new life, which 
> poses new challenges for our country. How we will cope with these tasks 
> depends on the future of our country.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Deleted] (ARROW-13177) Technology Acceptance juego friv Model

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13177:
---


> Technology Acceptance juego friv Model
> --
>
> Key: ARROW-13177
> URL: https://issues.apache.org/jira/browse/ARROW-13177
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> Advances in computing and information technology are changing the way people 
> meet and communicate. People can meet, talk, and work together outside 
> traditional meeting and office spaces. For instance, with the introduction of 
> software designed to help people schedule meetings and facilitate decision or 
> learning processes, is weakening geographical constraints and changing 
> interpersonal communication dynamics. Information technology is also 
> dramatically affecting the way people teach and learn.
> As new information technologies infiltrate workplaces, home, and classrooms, 
> research on user acceptance of new technologies has started to receive much 
> attention from professionals as well as academic researchers. Developers and 
> software industries are beginning to realize that lack of user acceptance of 
> technology can lead to loss of money and resources.
> In studying user acceptance and use of technology, the TAM is one of the most 
> cited models. The Technology Acceptance Model (TAM) was developed by Davis to 
> explain computer-usage behavior. The theoretical basis of the model was 
> Fishbein and Ajzen's Theory of Reasoned Action (TRA).
> The Technology Acceptance Model (TAM) is an information systems (System 
> consisting of the network of all communication channels used within an 
> organization) theory that models how users come to accept and use a 
> technology, The model suggests that when users are presented with a new 
> software package, a number of factors influence their decision about how and 
> when they will use it, notably:
> Perceived usefulness (PU) - This was defined by Fred Davis as "the degree to 
> which a person believes that using a particular system would enhance his or 
> her job performance".
> Perceived ease-of-use (PEOU) Davis defined this as "the degree to which a 
> person believes that using a particular system would be free from effort" 
> (Davis, 1989).
> The goal of TAM is "to provide an explanation of the determinants of computer 
> acceptance that is general, capable of explaining user behavior across a 
> broad range of end-user computing technologies and user populations, while at 
> the same time being both parsimonious and theoretically justified".
> According to the TAM, if a user perceives a specific technology as useful, 
> she/he will believe in a positive use-performance relationship. Since effort 
> is a finite resource, a user is likely to accept an application when she/he 
> perceives it as easier to use than another .As a consequence, educational 
> technology with a high level of PU and PEOU is more likely to induce positive 
> perceptions. The relation between PU and PEOU is that PU mediates the effect 
> of PEOU on attitude and intended use. In other words, while PU has direct 
> impacts on attitude and use, PEOU influences attitude and use indirectly 
> through PU.
> User acceptance is defined as "the demonstrable willingness within a user 
> group to employ information technology for the tasks it is designed to 
> support" (Dillon & Morris). Although this definition focuses on planned and 
> intended uses of technology, studies report that individual perceptions of 
> information technologies are likely to be influenced by the objective 
> characteristics of technology, as well as interaction with other users. For 
> example, the extent to which one evaluates new technology as useful, she/he 
> is likely to use it. At the same time, her/his perception of the system is 
> influenced by the way people around her/him evaluate and use the system.
> Studies on information technology continuously report that user attitudes are 
> important factors affecting the success of the system. For the past several 
> decades, many definitions of attitude have been proposed. However, all 
> theories consider attitude to be a relationship between a person and an 
> [*juego 
> friv*|https://complextime.com/friv-everything-you-need-to-know-about-it/] ** 
> object (Woelfel, 1995).
> In the context of information technologies, is an approach to the study of 
> attitude - the technology acceptance model (TAM). TAM suggests users 
> formulate a positive attitude toward the technology when they perceive the 
> technology to be useful and easy to use (Davis, 1989).
> A review of scholarly research on IS acceptance and usage suggests that TAM 
> has emerged as one of the most influential models in this stream of research 
> The TAM

[jira] [Deleted] (ARROW-13181) Big Data and Technology Services Market f 95 zone

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13181:
---


> Big Data and Technology Services Market f 95 zone
> -
>
> Key: ARROW-13181
> URL: https://issues.apache.org/jira/browse/ARROW-13181
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> Big data has been touted as the next massive transformation in global data 
> analysis and management. Businesses around the globe have incorporated big 
> data in their operations to make sense of the seeming myriad data generated 
> on a consistent basis. The adoption of big data technology and services has 
> grown at a robust pace among end-use industries. As big data becomes more 
> mainstream, and integration with cloud and artificial intelligences becomes 
> more streamlined, further growth is projected. According to a recently 
> published report, the global big data technology and services market is 
> poised to reach a valuation of over US$ 184 Bn.
> Data-Driven Decision Making Continues to Fuel Adoption of Big Data Technology 
> and Services
> Over the years, there has been significant shift in how businesses make 
> critical business decisions. Assumptions and traditional intelligence 
> gathering has given way to fact-based, data-driven decision making which has 
> furthered the cause for adopting big data solutions. The change in status-quo 
> has been one of the key factors for the growing adoption of big data 
> technology and services in various end-use industries. As more businesses are 
> realizing the advantages of big data in decision-making, it is highly likely 
> that adoption of big data technology and services will grow at a steady pace 
> in the short- and long-term.
> The information big data analysis brings to the fore has also helped 
> businesses bridge the challenges associated with agility and stakeholder 
> empowerment. Businesses have traditionally faced an uphill task in terms of 
> finding that elusive balance between agility and decentralization. Counting 
> in everyone's opinion before making big decisions has been the utopian focus 
> of businesses, however, it also comes with the risk of slowing down the 
> decision-making process in a hyper-competitive environment. The RACI 
> framework, which has been referred by businesses to reduce ambiguity on 
> choosing the right authority on decision-making is becoming easier to 
> navigate as access to data makes the entire decision-making process a 
> seamless affair.
> Integration of Big Data with Traditional Business Intelligence - The Way 
> Forward?
> Integration of big data technology and services with traditional business 
> intelligence is being looked upon as the way forward for businesses focusing 
> on quick fact-based decision making and improvement in customer experience. 
> Business intelligence has been a reliable tool for companies to understand 
> their target audience more intimately; however, the high turnaround time has 
> remained an impediment. The incorporation of big data has mitigated this 
> challenge to an extent, which in turn has fuelled adoption among end-users. 
> In the future, it is highly likely that big data and business intelligence 
> will become highly intertwined.
> Banking, Financial services and Insurance (BFSI) Industry Continues to be at 
> Forefront of Adoption
> Although adoption of big data technology and service has been pervasive, BFSI 
> sector has remained at the forefront of adoption since the early days of big 
> data. The sheer volume of data generated on a daily basis in the BFSI 
> industry has necessitated the adoption of a holistic data monitoring, 
> gathering, and analysis solutions. Some of the key challenges that the BFSI 
> sector currently faces include fraud identification, unorganized data, and 
> operational inefficiency. The inclusion of big data technology and services 
> has helped alleviate some of these challenges to a great extent. On the back 
> of these improvements, there has been a significant penetration of big data 
> in the BFSI sector. According to current estimates, revenues generated from 
> adoption of big data technology and services are likely to reach over US$ 33 
> billion in terms of revenues by 2026.
> Inclusion of Big Data Technology and Services Gaining Ground in Healthcare 
> Sector
> Big data has massive potential in the healthcare industry, with proponents 
> touting benefits ranging from epidemic prediction and reduced cost of 
> treatments. Although electronic health records (EHR) have been a staple in 
> the healthcare sector for quite a while, their efficacy is limited to the 
> medical history of patients. Big data, on the other hand, promises a 
> comprehensive, holistic data analysis

[jira] [Deleted] (ARROW-13178) Disruptive Technologies firv

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13178:
---


> Disruptive Technologies firv
> 
>
> Key: ARROW-13178
> URL: https://issues.apache.org/jira/browse/ARROW-13178
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> I am not into technologies, those that change so ever fast, and always. But I 
> do observe technological trends, along which the development of scientific 
> applications revolves.
> And of all trends, perhaps disruptive technologies are the defining path of 
> industrial implications, a linear passage that technological progress almost 
> invariably follows. Though the concept of "disruptive technologies" is only 
> popularized in 1997 by Harvard Business School Professor Clayton Christensen 
> in his best-seller "The Innovator's Dilemma", the phenomenon was already 
> evidenced back in 1663, when Edward Somerset published designs for, and might 
> have installed, a steam engine.
> As put forth by Clayton Christensen, disruptive technologies are initially 
> low performers of poor profit margins, targeting only a minute sector of the 
> market. However, they often develop faster than industry incumbents and 
> eventually outpace the giants to capture significant market shares as their 
> technologies, cheaper and more efficient, could better meet prevailing 
> consumers' demands.
> In this case, the steam engines effectively displaced horse power. The demand 
> for steam engines was not initially high, due to the then unfamiliarity to 
> the invention, and the ease of usage and availability of horses. However, as 
> soon as economic activities intensified, and societies prospered, a niche 
> market for steam engines quickly developed as people wanted modernity and 
> faster transportation.
> One epitome of modern disruptive technologies is Napster, a free and easy 
> music sharing program that allows users to distribute any piece of recording 
> online. The disruptee here is conventional music producers. Napster 
> relevantly identified the "non-market", the few who wanted to share their own 
> music recordings for little commercial purpose, and thus provided them with 
> what they most wanted. Napster soon blossomed and even transformed the way 
> the internet was utilized.
> Nevertheless, there are more concerns in the attempt to define disruptive 
> technologies than simply the definition itself.
> One most commonly mistaken feature for disruptive technologies is sustaining 
> technologies. While the former brings new technological innovation, the 
> latter refers to "successive incremental improvements to performance" 
> incorporated into existing products of market incumbents. Sustaining 
> technologies could be radical, too; the new improvements could herald the 
> demise of current states of production, like how music editor softwares 
> convenience Napster users in music customization and sharing, thereby 
> trumping over traditional whole-file transfers. The music editors are part of 
> a sustaining technological to Napster, not a new disruptor. Thus, disruptive 
> and sustaining technologies could thrive together, until the next wave of 
> disruption comes.
> See how music editors are linked to steam engines? Not too close, but each 
> represents one aspect of the twin engines that drive progressive 
> technologies; disruptors breed sustainers, and sustainers feed disruptors.
> This character of sustaining technologies brings us to another perspective of 
> disruptive technologies: they not only change the way people do business, but 
> also initiate a fresh wave of follow-up technologies that propel the 
> disruptive technology to success. Sometimes, sustaining technologies manage 
> to carve out a niche market for its own even when the disruptive initiator 
> has already shut down. Music editor and maker softwares continue to healthily 
> thrive, despite Napster's breakdown (though many other file sharing services 
> are functioning by that time), with products like the AV Music Morpher Gold 
> and Sound Forge 8.
> A disruptive technology is also different from a paradigm shift, which Thomas 
> Kuhn used to describe "the process and result of a change in basic 
> assumptions within the ruling theory of science". In disruptive technologies, 
> there are no assumptions, but only the rules of game of which the change is 
> brought about by the behaviors of market incumbents and new entrants. They 
> augment different markets that eventually merge. In Clayton Christensen's 
> words, newcomers to the industry almost invariably "crush the incumbents".
> While researching on disruptive 
> [*firv*|https://complextime.com/friv-everything-you-need-to-know-about-it/] 
> technologies, I came across this

[jira] [Deleted] (ARROW-13184) Technology Trends That Will f9zone Dominate 2017

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane deleted ARROW-13184:
---


> Technology Trends That Will f9zone Dominate 2017
> 
>
> Key: ARROW-13184
> URL: https://issues.apache.org/jira/browse/ARROW-13184
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> Technology has remarkably changed the way we live today, there is no denial 
> to it. Compared with our ancestors, we stand far away from them in using 
> different technologies for our day-to-day works.
> So many technologies are developed in the past couple of years that have 
> revolutionized our lives, and it's impossible to list each of them. Though 
> technology changes fast with time, we can observe the trends in which it 
> changes. Last year, 2016 had bought so many fresh innovative ideas and 
> creations towards automation and integration etc., and this year 2017 is 
> expected to continue the similar kind of trend.
> In this article, we are going to discuss some of the notable trends for this 
> year, which will make us look beyond the horizon.
> Gartner's 2016 Hype Cycle for emerging technologies have identified different 
> technologies that will be trending this year. The cycle illustrates the fact 
> how technology innovations are redefining the relations between the customer 
> and marketer.
> This year, Gartner has identified Blockchains, Connected Homes, Cognitive 
> Expert Advisors, Machine Learning, Software-defined Security etc. as the 
> overarching technology trends, which have the potential of reshaping the 
> business models and offering enterprises the definite route to emerging 
> markets and ecosystems.
> #1. Blockchain
> Popularly known as 'Distributed Ledger Technology' for both financial and 
> non-financial transactions, is one of the mystifying concepts that 
> technologists could only understand to the fullest. Various advancements in 
> blockchain have helped many people and more businesses in 2016, to experience 
> its potential in banking and finance industry. This year, it is anticipated 
> that blockchain technology would go beyond just banking sector, helping the 
> start-ups and established businesses to address the market needs with 
> different application offerings.
> #2. Internet of Things & Smart Home Tech
> With the advent of IoT, we are already eyeing the world of inter-connected 
> things, aren't we? Our dreams of living in smart homes are met to a certain 
> extent in 2016. So, what is stopping us from fulfilling our dreams of living 
> in smart connected homes?
> Well, the fact is that the market is full of abundant individual appliances 
> and apps, but only a little amount of solutions integrate them into a single, 
> inclusive user experience. It is anticipated that 2017 will notice this trend 
> to undergo a big step towards fulfilling our dreams.
> #3. Artificial Intelligence & Machine Learning
> In the recent times, Artificial Intelligence and Machine Learning have taken 
> the entire world by storm with its amazing inventions and innovative 
> technologies. By observing the on-going advancements in this field, it will 
> be no longer an imagination to experience the world where robots and machine 
> will dominate the society.
> Last year, we have witnessed the rise of ML algorithms on almost all major 
> e-commerce portals and its associated mobile apps, which is further expected 
> to spread across on all social networking platforms, dating websites, and 
> matrimonial websites in 2017.
> #4. Software-defined Security
> In 2016, we have observed a significant growth for increased server security. 
> Many organizations have started recognizing the significance of cybersecurity 
> to enable their move of emerging as digital businesses. The growth of 
> cloud-based infrastructure is causing a great demand for managing 
> unstructured data, and moreover, the lack of technical expertise and threat 
> to data security, are the key factors hindering the substantial growth of 
> software-defined security market this year.
> #5. Automation
> Automation will be the mainstay throughout 2017, the coming years will be 
> transformative for IT industry, enabling the automation of human performed 
> tasks. When Machine Learning is combined with automation, 
> [*f9zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the 
> marketers are likely to witness wide business opportunities with enriched 
> market results.
> #6. Augmented Reality (AR) & Virtual Reality (VR)
> AR and VR transform the way users interact with each other and software 
> systems. The year 2016 has experienced path-breaking steps in AR and VR 
> technology.
> With the launch of Oculus Rift, the market had received an overwhelming 
> response

[jira] [Closed] (ARROW-13184) Technology Trends That Will f9zone Dominate 2017

2021-06-25 Thread Jonathan Keane (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane closed ARROW-13184.
--
Resolution: Not A Bug

> Technology Trends That Will f9zone Dominate 2017
> 
>
> Key: ARROW-13184
> URL: https://issues.apache.org/jira/browse/ARROW-13184
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Abigail Cole
>Priority: Major
>
> Technology has remarkably changed the way we live today, there is no denial 
> to it. Compared with our ancestors, we stand far away from them in using 
> different technologies for our day-to-day works.
> So many technologies are developed in the past couple of years that have 
> revolutionized our lives, and it's impossible to list each of them. Though 
> technology changes fast with time, we can observe the trends in which it 
> changes. Last year, 2016 had bought so many fresh innovative ideas and 
> creations towards automation and integration etc., and this year 2017 is 
> expected to continue the similar kind of trend.
> In this article, we are going to discuss some of the notable trends for this 
> year, which will make us look beyond the horizon.
> Gartner's 2016 Hype Cycle for emerging technologies have identified different 
> technologies that will be trending this year. The cycle illustrates the fact 
> how technology innovations are redefining the relations between the customer 
> and marketer.
> This year, Gartner has identified Blockchains, Connected Homes, Cognitive 
> Expert Advisors, Machine Learning, Software-defined Security etc. as the 
> overarching technology trends, which have the potential of reshaping the 
> business models and offering enterprises the definite route to emerging 
> markets and ecosystems.
> #1. Blockchain
> Popularly known as 'Distributed Ledger Technology' for both financial and 
> non-financial transactions, is one of the mystifying concepts that 
> technologists could only understand to the fullest. Various advancements in 
> blockchain have helped many people and more businesses in 2016, to experience 
> its potential in banking and finance industry. This year, it is anticipated 
> that blockchain technology would go beyond just banking sector, helping the 
> start-ups and established businesses to address the market needs with 
> different application offerings.
> #2. Internet of Things & Smart Home Tech
> With the advent of IoT, we are already eyeing the world of inter-connected 
> things, aren't we? Our dreams of living in smart homes are met to a certain 
> extent in 2016. So, what is stopping us from fulfilling our dreams of living 
> in smart connected homes?
> Well, the fact is that the market is full of abundant individual appliances 
> and apps, but only a little amount of solutions integrate them into a single, 
> inclusive user experience. It is anticipated that 2017 will notice this trend 
> to undergo a big step towards fulfilling our dreams.
> #3. Artificial Intelligence & Machine Learning
> In the recent times, Artificial Intelligence and Machine Learning have taken 
> the entire world by storm with its amazing inventions and innovative 
> technologies. By observing the on-going advancements in this field, it will 
> be no longer an imagination to experience the world where robots and machine 
> will dominate the society.
> Last year, we have witnessed the rise of ML algorithms on almost all major 
> e-commerce portals and its associated mobile apps, which is further expected 
> to spread across on all social networking platforms, dating websites, and 
> matrimonial websites in 2017.
> #4. Software-defined Security
> In 2016, we have observed a significant growth for increased server security. 
> Many organizations have started recognizing the significance of cybersecurity 
> to enable their move of emerging as digital businesses. The growth of 
> cloud-based infrastructure is causing a great demand for managing 
> unstructured data, and moreover, the lack of technical expertise and threat 
> to data security, are the key factors hindering the substantial growth of 
> software-defined security market this year.
> #5. Automation
> Automation will be the mainstay throughout 2017, the coming years will be 
> transformative for IT industry, enabling the automation of human performed 
> tasks. When Machine Learning is combined with automation, 
> [*f9zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the 
> marketers are likely to witness wide business opportunities with enriched 
> market results.
> #6. Augmented Reality (AR) & Virtual Reality (VR)
> AR and VR transform the way users interact with each other and software 
> systems. The year 2016 has experienced path-breaking steps in AR and VR 
> technology.
> With the launch of Oculus Rift, the market had received an

[jira] [Updated] (ARROW-13149) [R] Convert named lists to structs instead of (unnamed) lists

2021-06-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13149:
---
Labels: pull-request-available  (was: )

> [R] Convert named lists to structs instead of (unnamed) lists
> -
>
> Key: ARROW-13149
> URL: https://issues.apache.org/jira/browse/ARROW-13149
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Jonathan Keane
>Assignee: Jonathan Keane
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13172) [Java] Make TYPE_WIDTH in Vector public

2021-06-25 Thread David Li (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-13172:
-
Summary: [Java] Make TYPE_WIDTH in Vector public  (was: Make TYPE_WIDTH in 
Vector public)

> [Java] Make TYPE_WIDTH in Vector public
> ---
>
> Key: ARROW-13172
> URL: https://issues.apache.org/jira/browse/ARROW-13172
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Eduard Tudenhoefner
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Some Vector classes already expose the TYPE_WIDTH publicly. It would be 
> helpful if all Vectors would do that



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13174) [C++][Compute] Add strftime kernel

2021-06-25 Thread David Li (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-13174:
-
Summary: [C++][Compute] Add strftime kernel  (was: [C+][Compute] Add 
strftime kernel)

> [C++][Compute] Add strftime kernel
> --
>
> Key: ARROW-13174
> URL: https://issues.apache.org/jira/browse/ARROW-13174
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> To convert timestamps to a string representation with an arbitrary format we 
> require a strftime kernel (the inverse operation of the {{strptime}} kernel 
> we already have).
> See [comments 
> here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13184) Technology Trends That Will f9zone Dominate 2017

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13184:


 Summary: Technology Trends That Will f9zone Dominate 2017
 Key: ARROW-13184
 URL: https://issues.apache.org/jira/browse/ARROW-13184
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


Technology has remarkably changed the way we live today, there is no denial to 
it. Compared with our ancestors, we stand far away from them in using different 
technologies for our day-to-day works.

So many technologies are developed in the past couple of years that have 
revolutionized our lives, and it's impossible to list each of them. Though 
technology changes fast with time, we can observe the trends in which it 
changes. Last year, 2016 had bought so many fresh innovative ideas and 
creations towards automation and integration etc., and this year 2017 is 
expected to continue the similar kind of trend.

In this article, we are going to discuss some of the notable trends for this 
year, which will make us look beyond the horizon.

Gartner's 2016 Hype Cycle for emerging technologies have identified different 
technologies that will be trending this year. The cycle illustrates the fact 
how technology innovations are redefining the relations between the customer 
and marketer.

This year, Gartner has identified Blockchains, Connected Homes, Cognitive 
Expert Advisors, Machine Learning, Software-defined Security etc. as the 
overarching technology trends, which have the potential of reshaping the 
business models and offering enterprises the definite route to emerging markets 
and ecosystems.

#1. Blockchain

Popularly known as 'Distributed Ledger Technology' for both financial and 
non-financial transactions, is one of the mystifying concepts that 
technologists could only understand to the fullest. Various advancements in 
blockchain have helped many people and more businesses in 2016, to experience 
its potential in banking and finance industry. This year, it is anticipated 
that blockchain technology would go beyond just banking sector, helping the 
start-ups and established businesses to address the market needs with different 
application offerings.

#2. Internet of Things & Smart Home Tech

With the advent of IoT, we are already eyeing the world of inter-connected 
things, aren't we? Our dreams of living in smart homes are met to a certain 
extent in 2016. So, what is stopping us from fulfilling our dreams of living in 
smart connected homes?

Well, the fact is that the market is full of abundant individual appliances and 
apps, but only a little amount of solutions integrate them into a single, 
inclusive user experience. It is anticipated that 2017 will notice this trend 
to undergo a big step towards fulfilling our dreams.

#3. Artificial Intelligence & Machine Learning

In the recent times, Artificial Intelligence and Machine Learning have taken 
the entire world by storm with its amazing inventions and innovative 
technologies. By observing the on-going advancements in this field, it will be 
no longer an imagination to experience the world where robots and machine will 
dominate the society.

Last year, we have witnessed the rise of ML algorithms on almost all major 
e-commerce portals and its associated mobile apps, which is further expected to 
spread across on all social networking platforms, dating websites, and 
matrimonial websites in 2017.

#4. Software-defined Security

In 2016, we have observed a significant growth for increased server security. 
Many organizations have started recognizing the significance of cybersecurity 
to enable their move of emerging as digital businesses. The growth of 
cloud-based infrastructure is causing a great demand for managing unstructured 
data, and moreover, the lack of technical expertise and threat to data 
security, are the key factors hindering the substantial growth of 
software-defined security market this year.

#5. Automation

Automation will be the mainstay throughout 2017, the coming years will be 
transformative for IT industry, enabling the automation of human performed 
tasks. When Machine Learning is combined with automation, 
[*f9zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the 
marketers are likely to witness wide business opportunities with enriched 
market results.

#6. Augmented Reality (AR) & Virtual Reality (VR)

AR and VR transform the way users interact with each other and software 
systems. The year 2016 has experienced path-breaking steps in AR and VR 
technology.

With the launch of Oculus Rift, the market had received an overwhelming 
response from the users, making way to a plethora of VR-based apps and games. 
Further, when Pokémon Go was released, it has completely re-defined the 
definition of gaming experience. It was one of the most profitable and 
downloaded the mobile application of 2016.

The response AR and VR technology has received last year was

[jira] [Created] (ARROW-13183) How Has Technology Changed f95z Our Lives?

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13183:


 Summary: How Has Technology Changed f95z Our Lives?
 Key: ARROW-13183
 URL: https://issues.apache.org/jira/browse/ARROW-13183
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


In the midst of the darkness that engulfed the world, the technology changed 
the entire life of the human beings. Undoubtedly, we have some negative 
repercussions of the technology but the positive results of technology have 
more weight than that of negative. However, it seems a little bit difficult for 
us to believe that technology has changed our life because it has taken its 
place slowly and gradually. Therefore, there are innumerable justifications 
which have been spotlighted below which can prove us that how technology has 
changed our life in-toto.

Education System

Education is a broad field but if we take only a single aspect that is the way 
of learning then we can come across with great difference that how it has 
changed our life. For instance, when we were young, it was so hard for us to 
have a good education along with the variety of examples, and we used to go to 
buy different expensive books just for the sake of limited topics for making 
notes and can have good marks in our exams. However, in this technological 
world, it has become very easy to access different topics on the world of the 
internet in the very short span of time which also can also be shared with the 
friends on social media

Business System

In the ancient time, it was too difficult to give advertisement of newly 
launched business with outdated sources such as pasting posters on the wall, 
distributing the pamphlet to people in a busy market, etc. However, in this 
contemporary world, technology has made very easy for sharing advertisement of 
our business at different areas such as on internet sites, on social media, on 
big LCD's at busy roads, etc. So, this is how our life has changed due to 
technical assistance and we can easily promote our business in no time.

Medical Department

Besides the field of business, Medical Department is at its peak just because 
of technology. In early life, it was the only Malaria, a fatal disease, because 
of that many people lost their lives, but now this Malaria which is caused by 
Plasmodium can easily be treated without any risk. Similarly, this medical 
*[f95z|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* science is 
working efficiently and it has diagnosed innumerable ways to live a secure life 
than earlier. Therefore, technology is the only liable course which has changed 
our life.

Communication System

Last but not least, the communication system has completely changed our life in 
this technological world and has made a world as a global village. Formerly, 
people used to send their message through the help of pigeons, then postman but 
now it has become very easy not to just send the message but also can have 
access to video call to the one you want to send the message. This is the 
internet along with smartphones which have made easier for every individual to 
connect himself with all his distant relatives around the world. Thus, it is 
the only technology which has made our lives easier than before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13182) Innovative Ideas in the Field of f95 zone Technology

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13182:


 Summary: Innovative Ideas in the Field of f95 zone Technology
 Key: ARROW-13182
 URL: https://issues.apache.org/jira/browse/ARROW-13182
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


Innovative ideas in the field of technology have simplified the work and helped 
our rapid development. These ideas contribute to the creation of innovative 
technologies over time. In order to create this innovative idea, it is 
necessary to have the knowledge, which is fundamental in this process.
Thus we get the scheme: knowledge, idea, technology.

To date, innovative technologies are traditionally divided into two segments: 
information technologies (technologies of automated information processing) and 
communication technologies (technologies for storage and transmission of 
information). For example, with the help of communication technologies, people 
can receive and transmit various contents, being in different corners of our 
world. International relations, including education, business negotiations and 
much more are now possible faster and more efficiently. If we recall the 
communication innovations in the field of education, first of all, it should be 
emphasized that people can enter higher education institutions and study 
remotely regardless of their location. Furthermore, every qualified pedagogue 
teaches something new and useful. Communication with representatives of other 
countries contributes to our self-development. All this eventually promotes the 
creation of qualified unique staff.

Information technologies allow:
- To automate certain labour-intensive operations;
- Automate and optimize production planning;
- Optimize individual business processes (for example, customer relations, 
asset management, document management, management decision-making), taking into 
account the specifics of various branches of economic activity. Information 
technology is used for large data processing systems, computing on a personal 
computer, in science and education, in management, computer-aided design and 
the creation of systems with artificial intelligence. Information technologies 
are the modern technological systems of immense strategic importance 
(political, defence, economic, social and cultural), which led to the formation 
of a new concept of [*f95 
zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the world 
order - "who owns the information, he owns the world."

The spread of information and communication technologies play an important role 
in structural changes in all the areas of our life. For someone, it will be 
difficult to learn these technologies. Workers who will not be able to study 
will have to give way to the younger generation. Thus we are faced with a 
problem because, in order to use innovations in technologies and develop it, it 
is necessary to have a qualified youth. First and foremost there is the 
question of education. Anyway, only education can create a developed generation 
that will continue to strive for new knowledge and will meet the requirements 
of innovative technologies. In addition, I am convinced that innovative ideas 
in technologies have created a completely new life, which poses new challenges 
for our country. How we will cope with these tasks depends on the future of our 
country.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13181) Big Data and Technology Services Market f 95 zone

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13181:


 Summary: Big Data and Technology Services Market f 95 zone
 Key: ARROW-13181
 URL: https://issues.apache.org/jira/browse/ARROW-13181
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


Big data has been touted as the next massive transformation in global data 
analysis and management. Businesses around the globe have incorporated big data 
in their operations to make sense of the seeming myriad data generated on a 
consistent basis. The adoption of big data technology and services has grown at 
a robust pace among end-use industries. As big data becomes more mainstream, 
and integration with cloud and artificial intelligences becomes more 
streamlined, further growth is projected. According to a recently published 
report, the global big data technology and services market is poised to reach a 
valuation of over US$ 184 Bn.

Data-Driven Decision Making Continues to Fuel Adoption of Big Data Technology 
and Services

Over the years, there has been significant shift in how businesses make 
critical business decisions. Assumptions and traditional intelligence gathering 
has given way to fact-based, data-driven decision making which has furthered 
the cause for adopting big data solutions. The change in status-quo has been 
one of the key factors for the growing adoption of big data technology and 
services in various end-use industries. As more businesses are realizing the 
advantages of big data in decision-making, it is highly likely that adoption of 
big data technology and services will grow at a steady pace in the short- and 
long-term.

The information big data analysis brings to the fore has also helped businesses 
bridge the challenges associated with agility and stakeholder empowerment. 
Businesses have traditionally faced an uphill task in terms of finding that 
elusive balance between agility and decentralization. Counting in everyone's 
opinion before making big decisions has been the utopian focus of businesses, 
however, it also comes with the risk of slowing down the decision-making 
process in a hyper-competitive environment. The RACI framework, which has been 
referred by businesses to reduce ambiguity on choosing the right authority on 
decision-making is becoming easier to navigate as access to data makes the 
entire decision-making process a seamless affair.

Integration of Big Data with Traditional Business Intelligence - The Way 
Forward?

Integration of big data technology and services with traditional business 
intelligence is being looked upon as the way forward for businesses focusing on 
quick fact-based decision making and improvement in customer experience. 
Business intelligence has been a reliable tool for companies to understand 
their target audience more intimately; however, the high turnaround time has 
remained an impediment. The incorporation of big data has mitigated this 
challenge to an extent, which in turn has fuelled adoption among end-users. In 
the future, it is highly likely that big data and business intelligence will 
become highly intertwined.

Banking, Financial services and Insurance (BFSI) Industry Continues to be at 
Forefront of Adoption

Although adoption of big data technology and service has been pervasive, BFSI 
sector has remained at the forefront of adoption since the early days of big 
data. The sheer volume of data generated on a daily basis in the BFSI industry 
has necessitated the adoption of a holistic data monitoring, gathering, and 
analysis solutions. Some of the key challenges that the BFSI sector currently 
faces include fraud identification, unorganized data, and operational 
inefficiency. The inclusion of big data technology and services has helped 
alleviate some of these challenges to a great extent. On the back of these 
improvements, there has been a significant penetration of big data in the BFSI 
sector. According to current estimates, revenues generated from adoption of big 
data technology and services are likely to reach over US$ 33 billion in terms 
of revenues by 2026.

Inclusion of Big Data Technology and Services Gaining Ground in Healthcare 
Sector

Big data has massive potential in the healthcare industry, with proponents 
touting benefits ranging from epidemic prediction and reduced cost of 
treatments. Although electronic health records (EHR) have been a staple in the 
healthcare sector for quite a while, their efficacy is limited to the medical 
history of patients. Big data, on the other hand, promises a comprehensive, 
holistic data analysis that can help healthcare providers in managing massive 
volume of data. The insights offered through inclusion of big data technology 
and services can help healthcare *[f 95 
zone|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* providers 
improve the profitability, while improving the care received by

[jira] [Created] (ARROW-13180) Teaching With f95 Technology

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13180:


 Summary: Teaching With f95 Technology
 Key: ARROW-13180
 URL: https://issues.apache.org/jira/browse/ARROW-13180
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


Teaching with technology helps to expand student learning by assistant 
instructional objectives. However, it can be thought-provoking to select the 
best technology tools while not losing sight of the goal for student learning. 
An expert can find creative and constructive ways to integrate technology into 
our class.

What do we mean by technology?

The term technology refers to the development of the techniques and tools we 
use to solve problems or achieve goals. Technology can encompass all kinds of 
tools from low-tech pencils, paper, a chalkboard to the use of presentation 
software, or high-tech tablets, online collaboration and conference tools and 
more. the newest technologies allow us to try things in physical and virtual 
classrooms that were not possible before.

How can technology help students?

Technology can help a student through the following ways:

1. Online collaboration tools: Technology has helped the students & instructors 
to share document online, editing of the document in real time and project them 
on a screen. This gives the students a collaborative platform in which to 
brainstorm ideas and document their work using text and pictures.

2. Presentation software: This enables the instructor to embed high-resolution 
photographs, diagrams, videos and sound files to augment the text and verbal 
lecture content.

3. Tablet: Here, tablets can be linked to computers, projectors, and cloud so 
that students and instructors can communicate through text, drawings, and 
diagrams.

4. Course management tools: This allows instructors to organize all the 
resources students' needs for the class. the syllabus, assignments, readings, 
online quizzes.

5. Smartphone: These are a quick and easy way to survey students during class. 
It is a great instant polling which can quickly access students understanding 
and help instructors to adjust pace and content

6. Lecture capture tools: The lecture capture tools allow instructors to record 
lectures directly from their computer without elaborate or additional classroom 
equipment. The record lectures at their own pace.

Advantages of technology integration in the education sphere?

The teaching strategies based on educational technology can be described as 
ethical that facilitates the students learning and boost their capacity, 
productivity, and performance. technology integration inspires positive changes 
in teaching methods on an international level. The following list of benefit 
will help in resolving a final conclusion:

1. Technology makes teaching easy: technology has power. It helps in the use of 
projectors and computer presentations to deliver 
*[f95|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* any type of 
lesson or instruction and improve the level of comprehension within the class 
rather than giving theoretical explanations that students cannot understand.

2. It facilitates student progress: technology has made teachers rely on 
platforms and tools that enable you to keep track of individual achievements.

3. Education technology is good for the environment: if all schools have 
dedicated to being using digital textbooks, can you imagine the amount of paper 
and number of trees that will be saved. students can be instructed to take an 
online test and submit their papers and homework through email. They can be 
also encouraged to use readers to read through the literature assigned.

4. It has made students enjoy learning: students enjoy learning through their 
addiction to Facebook, Instagram, dig, and other websites from a very early 
age. the internet can distract them from the learning process. making learning 
enjoyable through setting up a private Facebook group for the class and inspire 
constructive conversations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13179) Impact of Smart Technology on Data friv4school

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13179:


 Summary: Impact of Smart Technology on Data friv4school
 Key: ARROW-13179
 URL: https://issues.apache.org/jira/browse/ARROW-13179
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


With evolving smart technologies, the entire process of rendering data entry 
services has become way easier. Smart technologies are now helping businesses 
strategically and economically by generating data from every possible source 
including mobile phones, industrial equipment, smart accessories and personal 
computers.

Data entry services are considered to be "smart" on their responsiveness with 
respect to the incoming data. Businesses are looking for effective ways to 
manage data for obtaining better value and supporting their ultimate objectives.

Smart technologies tend to engage people and various smart devices with the 
related business, for better processing and collection of data from designated 
sources. For supporting and coping with the current evolution of such 
technologies, processes are being constantly renewed.

There are various smart applications that enhance data analytics processes and 
make them even better. These include Cloud Computing, Internet of Things, Smart 
Data and Machine Learning.

Need of Smart Technology
Data entry services, when offered with smart technologies provide real-time 
data processing, thus improving business's economic growth and providing a 
business-friendly option with efficient data management.

When looking for a suitable smart app for Nowadays, businesses are striving for 
more innovative strategies while incorporating these smart apps.

It eradicates the need of paper documents.
It provides innovation with a customer-centered approach.
These technologies are all industry-oriented, providing accurate results
These are scalable and easy-to-adopt.
They work even better with unorganized data volumes.
Collection of Data via Smart Technologies
Smart technologies assist in collecting and assembling data through:
Intelligent Capture replacing template-based data extraction with an efficient 
capturing module and natural language understanding.
Mobile Data Entry for collecting data on various mobile devices, enabling smart 
data entry services.
Robotic Process Automation (RPA) providing the latest smart recognition 
technology for improved data processing.
Data Alteration through Smart Technologies
For better use of these technologies, data entry services and methodologies are 
continuously being reshaped and revised, allowing organizations to take 
competitive advantage, along with enhancing cost-efficiency and security of 
business operations.
Smart technologies include Artificial Intelligence, Machine Learning, Internet 
of Things have now replaced manual processes that are more time-consuming, 
providing lesser room for human errors.

Let's talk about a few of these technologies:

Artificial Intelligence and Machine Learning are more responsive and secure 
when it comes to managing any repetitive task, recognizing various patterns and 
enhancing the accuracy level.
For expanding number of data sources and creating a connection between people, 
internet, devices and businesses, IOT (Internet of Things) is used extensively 
these days.
>From cloud computing services based on data entry services, businesses can 
>derive benefit and manage the complexity of their data infrastructure.
Effect of Intelligent Technologies
Smart technologies are drastically casting a positive impact over data entry 
services and rendering a friendlier approach, providing benefits in the 
following ways:
Better and more composed process, leading to reduction of human errors.
It has become faster and more efficient with easy management of data in bulk 
and from different sources like paper forms, scanned images and much more.
Streamlining the business operations 
[*friv4school*|https://complextime.com/friv-everything-you-need-to-know-about-it/]
 and changing the perception of businesses to deal with data management 
projects.
Increasing the potential to scale data entry processes and utilize innovative 
techniques.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13178) Disruptive Technologies firv

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13178:


 Summary: Disruptive Technologies firv
 Key: ARROW-13178
 URL: https://issues.apache.org/jira/browse/ARROW-13178
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


I am not into technologies, those that change so ever fast, and always. But I 
do observe technological trends, along which the development of scientific 
applications revolves.

And of all trends, perhaps disruptive technologies are the defining path of 
industrial implications, a linear passage that technological progress almost 
invariably follows. Though the concept of "disruptive technologies" is only 
popularized in 1997 by Harvard Business School Professor Clayton Christensen in 
his best-seller "The Innovator's Dilemma", the phenomenon was already evidenced 
back in 1663, when Edward Somerset published designs for, and might have 
installed, a steam engine.

As put forth by Clayton Christensen, disruptive technologies are initially low 
performers of poor profit margins, targeting only a minute sector of the 
market. However, they often develop faster than industry incumbents and 
eventually outpace the giants to capture significant market shares as their 
technologies, cheaper and more efficient, could better meet prevailing 
consumers' demands.

In this case, the steam engines effectively displaced horse power. The demand 
for steam engines was not initially high, due to the then unfamiliarity to the 
invention, and the ease of usage and availability of horses. However, as soon 
as economic activities intensified, and societies prospered, a niche market for 
steam engines quickly developed as people wanted modernity and faster 
transportation.

One epitome of modern disruptive technologies is Napster, a free and easy music 
sharing program that allows users to distribute any piece of recording online. 
The disruptee here is conventional music producers. Napster relevantly 
identified the "non-market", the few who wanted to share their own music 
recordings for little commercial purpose, and thus provided them with what they 
most wanted. Napster soon blossomed and even transformed the way the internet 
was utilized.

Nevertheless, there are more concerns in the attempt to define disruptive 
technologies than simply the definition itself.

One most commonly mistaken feature for disruptive technologies is sustaining 
technologies. While the former brings new technological innovation, the latter 
refers to "successive incremental improvements to performance" incorporated 
into existing products of market incumbents. Sustaining technologies could be 
radical, too; the new improvements could herald the demise of current states of 
production, like how music editor softwares convenience Napster users in music 
customization and sharing, thereby trumping over traditional whole-file 
transfers. The music editors are part of a sustaining technological to Napster, 
not a new disruptor. Thus, disruptive and sustaining technologies could thrive 
together, until the next wave of disruption comes.

See how music editors are linked to steam engines? Not too close, but each 
represents one aspect of the twin engines that drive progressive technologies; 
disruptors breed sustainers, and sustainers feed disruptors.

This character of sustaining technologies brings us to another perspective of 
disruptive technologies: they not only change the way people do business, but 
also initiate a fresh wave of follow-up technologies that propel the disruptive 
technology to success. Sometimes, sustaining technologies manage to carve out a 
niche market for its own even when the disruptive initiator has already shut 
down. Music editor and maker softwares continue to healthily thrive, despite 
Napster's breakdown (though many other file sharing services are functioning by 
that time), with products like the AV Music Morpher Gold and Sound Forge 8.

A disruptive technology is also different from a paradigm shift, which Thomas 
Kuhn used to describe "the process and result of a change in basic assumptions 
within the ruling theory of science". In disruptive technologies, there are no 
assumptions, but only the rules of game of which the change is brought about by 
the behaviors of market incumbents and new entrants. They augment different 
markets that eventually merge. In Clayton Christensen's words, newcomers to the 
industry almost invariably "crush the incumbents".

While researching on disruptive 
[*firv*|https://complextime.com/friv-everything-you-need-to-know-about-it/] 
technologies, I came across this one simple line that could adequately capture 
what these technologies are about, "A technology that no one in business wants 
but that goes on to be a trillion-dollar industry." Interesting how a brand new 
technology that seemingly bears little value could shake up an entire industry, 
isn't it?

[jira] [Created] (ARROW-13177) Technology Acceptance juego friv Model

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13177:


 Summary: Technology Acceptance juego friv Model
 Key: ARROW-13177
 URL: https://issues.apache.org/jira/browse/ARROW-13177
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


Advances in computing and information technology are changing the way people 
meet and communicate. People can meet, talk, and work together outside 
traditional meeting and office spaces. For instance, with the introduction of 
software designed to help people schedule meetings and facilitate decision or 
learning processes, is weakening geographical constraints and changing 
interpersonal communication dynamics. Information technology is also 
dramatically affecting the way people teach and learn.

As new information technologies infiltrate workplaces, home, and classrooms, 
research on user acceptance of new technologies has started to receive much 
attention from professionals as well as academic researchers. Developers and 
software industries are beginning to realize that lack of user acceptance of 
technology can lead to loss of money and resources.
In studying user acceptance and use of technology, the TAM is one of the most 
cited models. The Technology Acceptance Model (TAM) was developed by Davis to 
explain computer-usage behavior. The theoretical basis of the model was 
Fishbein and Ajzen's Theory of Reasoned Action (TRA).

The Technology Acceptance Model (TAM) is an information systems (System 
consisting of the network of all communication channels used within an 
organization) theory that models how users come to accept and use a technology, 
The model suggests that when users are presented with a new software package, a 
number of factors influence their decision about how and when they will use it, 
notably:

Perceived usefulness (PU) - This was defined by Fred Davis as "the degree to 
which a person believes that using a particular system would enhance his or her 
job performance".

Perceived ease-of-use (PEOU) Davis defined this as "the degree to which a 
person believes that using a particular system would be free from effort" 
(Davis, 1989).

The goal of TAM is "to provide an explanation of the determinants of computer 
acceptance that is general, capable of explaining user behavior across a broad 
range of end-user computing technologies and user populations, while at the 
same time being both parsimonious and theoretically justified".

According to the TAM, if a user perceives a specific technology as useful, 
she/he will believe in a positive use-performance relationship. Since effort is 
a finite resource, a user is likely to accept an application when she/he 
perceives it as easier to use than another .As a consequence, educational 
technology with a high level of PU and PEOU is more likely to induce positive 
perceptions. The relation between PU and PEOU is that PU mediates the effect of 
PEOU on attitude and intended use. In other words, while PU has direct impacts 
on attitude and use, PEOU influences attitude and use indirectly through PU.

User acceptance is defined as "the demonstrable willingness within a user group 
to employ information technology for the tasks it is designed to support" 
(Dillon & Morris). Although this definition focuses on planned and intended 
uses of technology, studies report that individual perceptions of information 
technologies are likely to be influenced by the objective characteristics of 
technology, as well as interaction with other users. For example, the extent to 
which one evaluates new technology as useful, she/he is likely to use it. At 
the same time, her/his perception of the system is influenced by the way people 
around her/him evaluate and use the system.
Studies on information technology continuously report that user attitudes are 
important factors affecting the success of the system. For the past several 
decades, many definitions of attitude have been proposed. However, all theories 
consider attitude to be a relationship between a person and an [*juego 
friv*|https://complextime.com/friv-everything-you-need-to-know-about-it/] ** 
object (Woelfel, 1995).

In the context of information technologies, is an approach to the study of 
attitude - the technology acceptance model (TAM). TAM suggests users formulate 
a positive attitude toward the technology when they perceive the technology to 
be useful and easy to use (Davis, 1989).

A review of scholarly research on IS acceptance and usage suggests that TAM has 
emerged as one of the most influential models in this stream of research The 
TAM represents an important theoretical contribution toward understanding IS 
usage and IS acceptance behaviors. However, this model -- with its original 
emphasis on the design of system characteristics - does not account for social 
influence in the adoption and utilization of new information systems.

 



--

[jira] [Created] (ARROW-13176) T Is for Technology in Triathlon Training

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13176:


 Summary: T Is for Technology in Triathlon Training
 Key: ARROW-13176
 URL: https://issues.apache.org/jira/browse/ARROW-13176
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


The original triathletes were amazing. Dave Scott and Mark Allen accomplished 
amazing feats in triathlon long before technology took over the sport. They 
didn't have metrics like we have today and they certainly didn't have all of 
the information gathering abilities we have. Yet, they set records and competed 
valiantly. In fact Mark Allen still holds the marathon record in Kona to this 
day. Technology is a great friend to triathletes but is does have a downside.

TECHNOLOGY ITEMS

So technology has taken over every part of triathlon. One of the most widely 
researched areas is the area of the triathlon watch. Each and every year there 
are new watches available for purchase that have ever increasing measurements 
for the triathlete. My personal favorite is the Garmin 910XT. This watch gives 
me heart rate, power (with a power meter), pacing (with optional foot pod), 
speed, cadence (with optional cadence sensor), mileage, yards in swimming, and 
much more. Each of these measurements aid me in measuring my success or 
failures in each and every training session and race.

Technology has been making huge strides in bicycles and wheel sets. The amount 
of research going into these two items within the world of triathlon is 
incredible. Each and every year there are new and exciting advances in 
aerodynamic speed in bicycles and wheel sets. Much of the time these 
technologies can take on two very different vantage points. This was most 
evident at the 2016 World Championships in Kona. Diamond Bikes unveiled their 
Andean bike which fills in all the space in between the front tire and the back 
tire with a solid piece to make the wind pass by this area for aerodynamics. 
Another bike debuted at Kona this year with the exact opposite idea. The Ventum 
bike eliminated the down tube of the bike and made a vacant space in between 
the front tire and the back tire with only the top tube remaining. These are 
two very different ideas about aerodynamics. This is one of the amazing things 
about the advancement of technology and one of the downsides as well.

Each and every piece of equipment in triathlon is undergoing constant 
technology advancements. Shoes, wetsuits, socks, nutrition, hats, sunglasses, 
helmets, racing kits, and anything else you can imagine. This world of 
technology in triathlon is not near to completion and will continue to push the 
limits.

THE UPSIDE TO TECHNOLOGY

Technology in triathlon is amazing. These new items are exciting and make each 
and every year different. There are new advancements that help triathletes go 
faster and longer. These new technologies help even the amateur triathlete to 
go faster. Just the purchase of new wheels can mean the difference between 
being on or off the podium. The advancement of shoes has aided many athletes to 
avoid the injuries that plague so many such as plantar fasciitis. Technology 
will continue to aid the sport in becoming better and better.

THE DOWNSIDE TO TECHNOLOGY

The downside to technology is that the amateur triathlete arrives at their 
local race already incapable of winning because someone else has the money to 
buy some of the latest technology. The biggest purchases such as wheel sets and 
bicycles can be cost prohibitive to the average triathlete and yet 
*[friv.com|https://complextime.com/friv-everything-you-need-to-know-about-it/]* 
there are individuals who purchase these items at alarming rates. The amateur 
triathlete can also feel overwhelmed at what to purchase and what not to 
purchase. Some items of technology are not worth the extra cost because they do 
not decrease racing time significantly enough for what they cost. Now that 
these new technologies have been out awhile, knock-offs have begun to make 
lower cost items. It will be interesting to watch the flood of these knock-offs 
into the market and see how that affects the big boys of technology.

If you are an amateur triathlete shop smart and don't go buy the new gadgets 
just because they are new. Make sure to invest in items that are going to truly 
make you faster and not just a gimmick.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13175) Technology Trends That Will f9zone Dominate 2017

2021-06-25 Thread Abigail Cole (Jira)

Abigail Cole created ARROW-13175:


 Summary: Technology Trends That Will f9zone Dominate 2017
 Key: ARROW-13175
 URL: https://issues.apache.org/jira/browse/ARROW-13175
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Abigail Cole


Technology has remarkably changed the way we live today, there is no denial to 
it. Compared with our ancestors, we stand far away from them in using different 
technologies for our day-to-day works.

So many technologies are developed in the past couple of years that have 
revolutionized our lives, and it's impossible to list each of them. Though 
technology changes fast with time, we can observe the trends in which it 
changes. Last year, 2016 had bought so many fresh innovative ideas and 
creations towards automation and integration etc., and this year 2017 is 
expected to continue the similar kind of trend.

In this article, we are going to discuss some of the notable trends for this 
year, which will make us look beyond the horizon.

Gartner's 2016 Hype Cycle for emerging technologies have identified different 
technologies that will be trending this year. The cycle illustrates the fact 
how technology innovations are redefining the relations between the customer 
and marketer.

This year, Gartner has identified Blockchains, Connected Homes, Cognitive 
Expert Advisors, Machine Learning, Software-defined Security etc. as the 
overarching technology trends, which have the potential of reshaping the 
business models and offering enterprises the definite route to emerging markets 
and ecosystems.

#1. Blockchain

Popularly known as 'Distributed Ledger Technology' for both financial and 
non-financial transactions, is one of the mystifying concepts that 
technologists could only understand to the fullest. Various advancements in 
blockchain have helped many people and more businesses in 2016, to experience 
its potential in banking and finance industry. This year, it is anticipated 
that blockchain technology would go beyond just banking sector, helping the 
start-ups and established businesses to address the market needs with different 
application offerings.

#2. Internet of Things & Smart Home Tech

With the advent of IoT, we are already eyeing the world of inter-connected 
things, aren't we? Our dreams of living in smart homes are met to a certain 
extent in 2016. So, what is stopping us from fulfilling our dreams of living in 
smart connected homes?

Well, the fact is that the market is full of abundant individual appliances and 
apps, but only a little amount of solutions integrate them into a single, 
inclusive user experience. It is anticipated that 2017 will notice this trend 
to undergo a big step towards fulfilling our dreams.

#3. Artificial Intelligence & Machine Learning

In the recent times, Artificial Intelligence and Machine Learning have taken 
the entire world by storm with its amazing inventions and innovative 
technologies. By observing the on-going advancements in this field, it will be 
no longer an imagination to experience the world where robots and machine will 
dominate the society.

Last year, we have witnessed the rise of ML algorithms on almost all major 
e-commerce portals and its associated mobile apps, which is further expected to 
spread across on all social networking platforms, dating websites, and 
matrimonial websites in 2017.

#4. Software-defined Security

In 2016, we have observed a significant growth for increased server security. 
Many organizations have started recognizing the significance of cybersecurity 
to enable their move of emerging as digital businesses. The growth of 
cloud-based infrastructure is causing a great demand for managing unstructured 
data, and moreover, the lack of technical expertise and threat to data 
security, are the key factors hindering the substantial growth of 
software-defined security market this year.

#5. Automation

Automation will be the mainstay throughout 2017, the coming years will be 
transformative for IT industry, enabling the automation of human performed 
tasks. When Machine Learning is combined with automation, the marketers are 
likely to witness wide business opportunities with enriched market results.

#6. Augmented Reality (AR) & Virtual Reality (VR)

AR and VR transform the way users interact with each other and software 
systems. The year 2016 has experienced path-breaking steps in AR and VR 
technology.

With the launch of Oculus Rift, the market had received an overwhelming 
response from the users, making way to a plethora of VR-based apps and games. 
Further, when Pokémon Go was released, it has completely re-defined the 
definition of gaming experience. It was one of the most profitable and 
downloaded the mobile application of 2016.

The response AR and VR technology has received last year was farfetched, and it 
forecasts that the world is ready to adopt this

[jira] [Updated] (ARROW-13174) [C+][Compute] Add strftime kernel

2021-06-25 Thread Joris Van den Bossche (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-13174:
--
Description: 
To convert timestamps to a string representation with an arbitrary format we 
require a strftime kernel (the inverse operation of the {{strptime}} kernel we 
already have).


See [comments 
here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].

  was:
To express timestamps with arbitrary format we require a strftime kernel.
See [comments 
here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].


> [C+][Compute] Add strftime kernel
> -
>
> Key: ARROW-13174
> URL: https://issues.apache.org/jira/browse/ARROW-13174
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> To convert timestamps to a string representation with an arbitrary format we 
> require a strftime kernel (the inverse operation of the {{strptime}} kernel 
> we already have).
> See [comments 
> here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12744) [C++][Compute] Add rounding kernel

2021-06-25 Thread Eduardo Ponce (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369415#comment-17369415
 ] 

Eduardo Ponce commented on ARROW-12744:
---

A draft PR is available that implements *round* function as a unary scalar 
function. It outputs float64 for integral inputs and matching type for 
floating-point inputs.

Rounding behavior is controlled via 2 option controls, a rounding mode 
(specifies displacement behavior) and a multiple (scale and precision).

Feedback is welcomed w.r.t. to implementation, rounding options and names, and 
documentation.

> [C++][Compute] Add rounding kernel
> --
>
> Key: ARROW-12744
> URL: https://issues.apache.org/jira/browse/ARROW-12744
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ian Cook
>Assignee: Eduardo Ponce
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Kernel to round an array of floating point numbers. Should return an array of 
> the same type as the input. Should have an option to control how many digits 
> after the decimal point (default value 0 meaning round to the nearest 
> integer).
> Midpoint values (e.g. 0.5 rounded to nearest integer) should round away from 
> zero (up for positive numbers, down for negative numbers).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-13174) [C+][Compute] Add strftime kernel

2021-06-25 Thread Rok Mihevc (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-13174:
--

Assignee: Rok Mihevc

> [C+][Compute] Add strftime kernel
> -
>
> Key: ARROW-13174
> URL: https://issues.apache.org/jira/browse/ARROW-13174
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Rok Mihevc
>Assignee: Rok Mihevc
>Priority: Major
>
> To express timestamps with arbitrary format we require a strftime kernel.
> See [comments 
> here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13133) [R] Add support for locale-specific day of week (and month of year?) returns from timestamp accessor functions

2021-06-25 Thread Rok Mihevc (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369370#comment-17369370
 ] 

Rok Mihevc commented on ARROW-13133:


https://issues.apache.org/jira/browse/ARROW-13174

> [R] Add support for locale-specific day of week (and month of year?) returns 
> from timestamp accessor functions
> --
>
> Key: ARROW-13133
> URL: https://issues.apache.org/jira/browse/ARROW-13133
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nic Crane
>Priority: Major
>
> The R binding for the wday date accessor added in this PR 
> [https://github.com/apache/arrow/pull/10507] currently doesn't support 
> returning the string representation of the day of the week (e.g. "Mon") and 
> only supports the numeric representation (e.g. 1).
> We should implement this, though discussion should be had about whether this 
> belongs at the R or C++ level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-13133) [R] Add support for locale-specific day of week (and month of year?) returns from timestamp accessor functions

2021-06-25 Thread Rok Mihevc (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369370#comment-17369370
 ] 

Rok Mihevc edited comment on ARROW-13133 at 6/25/21, 10:09 AM:
---

ARROW-13174


was (Author: rokm):
https://issues.apache.org/jira/browse/ARROW-13174

> [R] Add support for locale-specific day of week (and month of year?) returns 
> from timestamp accessor functions
> --
>
> Key: ARROW-13133
> URL: https://issues.apache.org/jira/browse/ARROW-13133
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nic Crane
>Priority: Major
>
> The R binding for the wday date accessor added in this PR 
> [https://github.com/apache/arrow/pull/10507] currently doesn't support 
> returning the string representation of the day of the week (e.g. "Mon") and 
> only supports the numeric representation (e.g. 1).
> We should implement this, though discussion should be had about whether this 
> belongs at the R or C++ level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13174) [C+][Compute] Add strftime kernel

2021-06-25 Thread Rok Mihevc (Jira)

Rok Mihevc created ARROW-13174:
--

 Summary: [C+][Compute] Add strftime kernel
 Key: ARROW-13174
 URL: https://issues.apache.org/jira/browse/ARROW-13174
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Rok Mihevc


To express timestamps with arbitrary format we require a strftime kernel.
See [comments 
here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13173) [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally

2021-06-25 Thread Yibo Cai (Jira)

Yibo Cai created ARROW-13173:


 Summary: [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally 
 Key: ARROW-13173
 URL: https://issues.apache.org/jira/browse/ARROW-13173
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 4.0.1, 4.0.0
Reporter: Yibo Cai


Observed one test case failure from Travis CI arm64 job.
https://travis-ci.com/github/apache/arrow/jobs/518630381#L2271

{{TestAsyncUtil.ReadaheadFailed}} asserted at 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_generator_test.cc#L1131

Looks _SleepABit()_ cannot guarantee that _finished_ will be set in time, 
especially on busy CI hosts where many jobs share one machine.

cc [~westonpace]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13173) [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally

2021-06-25 Thread Yibo Cai (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibo Cai updated ARROW-13173:
-
Fix Version/s: 5.0.0

> [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally 
> -
>
> Key: ARROW-13173
> URL: https://issues.apache.org/jira/browse/ARROW-13173
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 4.0.0, 4.0.1
>Reporter: Yibo Cai
>Priority: Major
> Fix For: 5.0.0
>
>
> Observed one test case failure from Travis CI arm64 job.
> https://travis-ci.com/github/apache/arrow/jobs/518630381#L2271
> {{TestAsyncUtil.ReadaheadFailed}} asserted at 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_generator_test.cc#L1131
> Looks _SleepABit()_ cannot guarantee that _finished_ will be set in time, 
> especially on busy CI hosts where many jobs share one machine.
> cc [~westonpace]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13172) Make TYPE_WIDTH in Vector public

2021-06-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13172:
---
Labels: pull-request-available  (was: )

> Make TYPE_WIDTH in Vector public
> 
>
> Key: ARROW-13172
> URL: https://issues.apache.org/jira/browse/ARROW-13172
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Eduard Tudenhoefner
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Some Vector classes already expose the TYPE_WIDTH publicly. It would be 
> helpful if all Vectors would do that



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-13172) Make TYPE_WIDTH in Vector public

2021-06-25 Thread Eduard Tudenhoefner (Jira)

Eduard Tudenhoefner created ARROW-13172:
---

 Summary: Make TYPE_WIDTH in Vector public
 Key: ARROW-13172
 URL: https://issues.apache.org/jira/browse/ARROW-13172
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Eduard Tudenhoefner


Some Vector classes already expose the TYPE_WIDTH publicly. It would be helpful 
if all Vectors would do that



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

79 matches

Mail list logo