[jira] [Resolved] (ARROW-12730) [MATLAB] Update featherreadmex and featherwritemex to build against latest arrow c++ APIs
[ https://issues.apache.org/jira/browse/ARROW-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-12730. -- Fix Version/s: 5.0.0 Resolution: Fixed Issue resolved by pull request 10305 [https://github.com/apache/arrow/pull/10305] > [MATLAB] Update featherreadmex and featherwritemex to build against latest > arrow c++ APIs > - > > Key: ARROW-12730 > URL: https://issues.apache.org/jira/browse/ARROW-12730 > Project: Apache Arrow > Issue Type: Task > Components: MATLAB >Reporter: Sarah Gilmore >Assignee: Sarah Gilmore >Priority: Minor > Labels: pull-request-available > Fix For: 5.0.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > The mex functions featherreadmex and featherwritemex currently do not compile > if you are using the latest arrow c++ APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV
[ https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-13187: Attachment: backward-refs.png > [c++][python] Possibly memory not deallocated when reading in CSV > - > > Key: ARROW-13187 > URL: https://issues.apache.org/jira/browse/ARROW-13187 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 4.0.1 >Reporter: Simon >Priority: Minor > Attachments: backward-refs.png, forward-refs.png > > > When one reads in a table from CSV in pyarrow version 4.0.1, it appears that > the read-in table variable is not freed (or not fast enough). I'm unsure if > this is because of pyarrow or because of the way pyarrow memory allocation > interacts with Python memory allocation. I encountered it when processing > many large CSVs sequentially. > When I run the following piece of code, the RAM memory usage increases quite > rapidly until it runs out of memory. > {code:python} > import pyarrow as pa > import pyarrow.csv > # Generate some CSV file to read in > print("Generating CSV") > with open("example.csv", "w+") as f_out: > for i in range(0, 1000): > f_out.write("123456789,abc def ghi jkl\n") > def read_in_the_csv(): > table = pa.csv.read_csv("example.csv") > print(table) # Not strictly necessary to replicate bug, table can also > be an unused variable > # This will free up the memory, as a workaround: > # table = table.slice(0, 0) > # Read in the CSV many times > print("Reading in a CSV many times") > for j in range(10): > read_in_the_csv() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV
[ https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369785#comment-17369785 ] Weston Pace commented on ARROW-13187: - I have tracked down the cause further but I'm not entirely sure what the correct fix should be but I think it is a problem in cython. The issue first occurs after commit 79ae4f6db3dfe06ba2e1b5c285a6695cfa58cf3d (ARROW-8732: [C++] Add basic cancellation API) The method "read_csv" calls "SignalStopHandler()" which calls "signal.getsignal" which calls "signal.py::_int_to_enum" which intentionally triggers a ValueError (as is normal in python). That ValueError has an associated traceback which is not disposed of correctly. That traceback has a reference to each of the frames of the stack and one of those frames has a reference to "table". Since a new traceback is generated for each loop of the iteration none of the CSVs are properly disposed of. The slice method in the original PR or "del table" is a workable workaround. As long as the frames aren't too big the garbage collector will eventually run and clean them up long before memory is lost. I have no idea why the ValueError/traceback are not being disposed of. I know cython has to do some games to manage tracebacks so it's possible there is an issue there. I think I created a reproduction in pure python calling getsignal and it seems to manage memory correctly so I believe python is clear. I've created a script to reproduce that also uses objgraph to generate reference graphs. It also only runs one iteration so it is quicker and doesn't exceed RAM on the system. It should print 0 as the last line. If there is a leak it prints out ~270M. {code:java} import gc import sys import pyarrow.parquet import pyarrow as pa import pyarrow.csv import objgraph # Generate some CSV file to read in print("Generating CSV") with open("example.csv", "w+") as f_out: for i in range(0, 1000): unused = f_out.write("123456789,abc def ghi jkl\n") def read_in_the_csv(): table = pa.csv.read_csv("example.csv") print(pa.total_allocated_bytes()) gc.disable() gc.collect() objs = gc.get_objects() read_in_the_csv() objs2 = gc.get_objects() offensive_ids = set([id(obj) for obj in objs2]) - set([id(obj) for obj in objs]) badobjs = [obj for obj in objs2 if id(obj) in offensive_ids] print(len(badobjs)) smallbadobjs = [obj for obj in badobjs if 'frame' in str(type(obj)) and 'read_in_the_csv' in str(obj)] objgraph.show_refs(smallbadobjs, refcounts=True) objgraph.show_backrefs(smallbadobjs, refcounts=True) print(pa.total_allocated_bytes()) {code} So at this point I surrender and ask [~apitrou] [~jorisvandenbossche] or [~amol-] for help :) *Forward refs show a frame in the traceback still reference Table*!forward-refs.png! *Backward refs show the frame is referenced as part of a traceback (note, this graph is truncated, and does not show the source ValueError. Also, the dict and two lists are from my debugging code and not related to the issue)* *!backward-refs.png!* > [c++][python] Possibly memory not deallocated when reading in CSV > - > > Key: ARROW-13187 > URL: https://issues.apache.org/jira/browse/ARROW-13187 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 4.0.1 >Reporter: Simon >Priority: Minor > Attachments: backward-refs.png, forward-refs.png > > > When one reads in a table from CSV in pyarrow version 4.0.1, it appears that > the read-in table variable is not freed (or not fast enough). I'm unsure if > this is because of pyarrow or because of the way pyarrow memory allocation > interacts with Python memory allocation. I encountered it when processing > many large CSVs sequentially. > When I run the following piece of code, the RAM memory usage increases quite > rapidly until it runs out of memory. > {code:python} > import pyarrow as pa > import pyarrow.csv > # Generate some CSV file to read in > print("Generating CSV") > with open("example.csv", "w+") as f_out: > for i in range(0, 1000): > f_out.write("123456789,abc def ghi jkl\n") > def read_in_the_csv(): > table = pa.csv.read_csv("example.csv") > print(table) # Not strictly necessary to replicate bug, table can also > be an unused variable > # This will free up the memory, as a workaround: > # table = table.slice(0, 0) > # Read in the CSV many times > print("Reading in a CSV many times") > for j in range(10): > read_in_the_csv() > {code} -- This message was sent by Atlassian Jira
[jira] [Updated] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV
[ https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-13187: Attachment: forward-refs.png > [c++][python] Possibly memory not deallocated when reading in CSV > - > > Key: ARROW-13187 > URL: https://issues.apache.org/jira/browse/ARROW-13187 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 4.0.1 >Reporter: Simon >Priority: Minor > Attachments: forward-refs.png > > > When one reads in a table from CSV in pyarrow version 4.0.1, it appears that > the read-in table variable is not freed (or not fast enough). I'm unsure if > this is because of pyarrow or because of the way pyarrow memory allocation > interacts with Python memory allocation. I encountered it when processing > many large CSVs sequentially. > When I run the following piece of code, the RAM memory usage increases quite > rapidly until it runs out of memory. > {code:python} > import pyarrow as pa > import pyarrow.csv > # Generate some CSV file to read in > print("Generating CSV") > with open("example.csv", "w+") as f_out: > for i in range(0, 1000): > f_out.write("123456789,abc def ghi jkl\n") > def read_in_the_csv(): > table = pa.csv.read_csv("example.csv") > print(table) # Not strictly necessary to replicate bug, table can also > be an unused variable > # This will free up the memory, as a workaround: > # table = table.slice(0, 0) > # Read in the CSV many times > print("Reading in a CSV many times") > for j in range(10): > read_in_the_csv() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13190) [C++] [Gandiva] Change behavior of INITCAP function
[ https://issues.apache.org/jira/browse/ARROW-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13190: --- Labels: pull-request-available (was: ) > [C++] [Gandiva] Change behavior of INITCAP function > --- > > Key: ARROW-13190 > URL: https://issues.apache.org/jira/browse/ARROW-13190 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Anthony Louis Gotlib Ferreira >Assignee: Anthony Louis Gotlib Ferreira >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The current behavior of the *INITCAP* function is to turn the first character > of each word uppercase and remains the other as is. > The desired behavior is to turn the first letter uppercase and the other > lowercase. Any character except the alphanumeric ones should be considered as > a word separator. > That behavior is based on these database systems: > * > [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm]) > * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)] > * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)] > * [Splice > Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html]) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13191) [Go] Support external schema in ipc readers
[ https://issues.apache.org/jira/browse/ARROW-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13191: --- Labels: pull-request-available (was: ) > [Go] Support external schema in ipc readers > --- > > Key: ARROW-13191 > URL: https://issues.apache.org/jira/browse/ARROW-13191 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Seth Hollyman >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > (Apologies if I'm imprecise here, I'm still coming up to speed on the arrow > details.) > > The IPC message format describes how data and metadata messages are > encapsulated, but it is not a requirement that each message must include the > schema. > > In Go, github.com/apache/arrow/go/arrow/ipc contains NewReader() for setting > up reading of IPC messages, and accepts the option WithSchema to pass the > schema into said reader. However, the implementation merely uses that > information to compare that the schema it reads from the IPC stream matches > the passed in reader. This request is to allow WithSchema to behave as > expected, and use the option-provided Schema for performing reads. > > The one gotcha here appears to be the dictionary type map, which is currently > retained independently of the schema but is part of the internal readSchema() > setup. Completeness may warrant another option for communicating those > externally as well? Or perhaps option-passed Schema should be documented to > not support dictionary types? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13191) [Go] Support external schema in ipc readers
Seth Hollyman created ARROW-13191: - Summary: [Go] Support external schema in ipc readers Key: ARROW-13191 URL: https://issues.apache.org/jira/browse/ARROW-13191 Project: Apache Arrow Issue Type: Improvement Components: Go Reporter: Seth Hollyman (Apologies if I'm imprecise here, I'm still coming up to speed on the arrow details.) The IPC message format describes how data and metadata messages are encapsulated, but it is not a requirement that each message must include the schema. In Go, github.com/apache/arrow/go/arrow/ipc contains NewReader() for setting up reading of IPC messages, and accepts the option WithSchema to pass the schema into said reader. However, the implementation merely uses that information to compare that the schema it reads from the IPC stream matches the passed in reader. This request is to allow WithSchema to behave as expected, and use the option-provided Schema for performing reads. The one gotcha here appears to be the dictionary type map, which is currently retained independently of the schema but is part of the internal readSchema() setup. Completeness may warrant another option for communicating those externally as well? Or perhaps option-passed Schema should be documented to not support dictionary types? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13190) [C++] [Gandiva] Change behavior of INITCAP function
[ https://issues.apache.org/jira/browse/ARROW-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Louis Gotlib Ferreira updated ARROW-13190: -- Description: The current behavior of the *INITCAP* function is to turn the first character of each word uppercase and remains the other as is. The desired behavior is to turn the first letter uppercase and the other lowercase. Any character except the alphanumeric ones should be considered as a word separator. That behavior is based on these database systems: * [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm]) * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)] * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)] * [Splice Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html]) was: The current behavior of the `INITCAP` function is to turn the first character of each word uppercase and remains the other as is. The desired behavior is to turn the first letter uppercase and the other lowercase. Any character except the alphanumeric ones should be considered as a word separator. That behavior is based on these database systems: * [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm]) * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)] * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)] * [Splice Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html]) > [C++] [Gandiva] Change behavior of INITCAP function > --- > > Key: ARROW-13190 > URL: https://issues.apache.org/jira/browse/ARROW-13190 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Anthony Louis Gotlib Ferreira >Assignee: Anthony Louis Gotlib Ferreira >Priority: Trivial > > The current behavior of the *INITCAP* function is to turn the first character > of each word uppercase and remains the other as is. > The desired behavior is to turn the first letter uppercase and the other > lowercase. Any character except the alphanumeric ones should be considered as > a word separator. > That behavior is based on these database systems: > * > [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm]) > * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)] > * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)] > * [Splice > Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html]) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13190) [C++] [Gandiva] Change behavior of INITCAP function
[ https://issues.apache.org/jira/browse/ARROW-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Louis Gotlib Ferreira updated ARROW-13190: -- Description: The current behavior of the `INITCAP` function is to turn the first character of each word uppercase and remains the other as is. The desired behavior is to turn the first letter uppercase and the other lowercase. Any character except the alphanumeric ones should be considered as a word separator. That behavior is based on these database systems: * [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm]) * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)] * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)] * [Splice Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html]) was: The current behavior of the `INITCAP` function is to turn the first character of each word uppercase and remains the other as is. The desired behavior is to turn the first letter uppercase and the other lowercase. Any character except the alphanumeric ones should be considered as a word separator. That behavior is based on these database systems: * [Oracle](https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm) * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)] * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)] * [Splice Machine](https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html) > [C++] [Gandiva] Change behavior of INITCAP function > --- > > Key: ARROW-13190 > URL: https://issues.apache.org/jira/browse/ARROW-13190 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva >Reporter: Anthony Louis Gotlib Ferreira >Assignee: Anthony Louis Gotlib Ferreira >Priority: Trivial > > The current behavior of the `INITCAP` function is to turn the first character > of each word uppercase and remains the other as is. > The desired behavior is to turn the first letter uppercase and the other > lowercase. Any character except the alphanumeric ones should be considered as > a word separator. > That behavior is based on these database systems: > * > [Oracle]([https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm]) > * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)] > * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)] > * [Splice > Machine]([https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html]) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13190) [C++] [Gandiva] Change behavior of INITCAP function
Anthony Louis Gotlib Ferreira created ARROW-13190: - Summary: [C++] [Gandiva] Change behavior of INITCAP function Key: ARROW-13190 URL: https://issues.apache.org/jira/browse/ARROW-13190 Project: Apache Arrow Issue Type: New Feature Components: C++ - Gandiva Reporter: Anthony Louis Gotlib Ferreira Assignee: Anthony Louis Gotlib Ferreira The current behavior of the `INITCAP` function is to turn the first character of each word uppercase and remains the other as is. The desired behavior is to turn the first letter uppercase and the other lowercase. Any character except the alphanumeric ones should be considered as a word separator. That behavior is based on these database systems: * [Oracle](https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions065.htm) * [Postgres]([https://w3resource.com/PostgreSQL/initcap-function.php)] * [Redshift]([https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html)] * [Splice Machine](https://doc.splicemachine.com/sqlref_builtinfcns_initcap.html) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-13119) [R] Set empty schema in scalar Expressions
[ https://issues.apache.org/jira/browse/ARROW-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook resolved ARROW-13119. -- Resolution: Fixed Resolved by https://github.com/apache/arrow/pull/10563 > [R] Set empty schema in scalar Expressions > -- > > Key: ARROW-13119 > URL: https://issues.apache.org/jira/browse/ARROW-13119 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Fix For: 5.0.0 > > > Closely related to ARROW-13117 is the problem of {{type()}} and {{type_id()}} > not working for scalar expressions. For example, currently this happens: > {code:r}> Expression$scalar("foo")$type() > Error: !is.null(schema) is not TRUE > > Expression$scalar(42L)$type() > Error: !is.null(schema) is not TRUE{code} > This is what we want to happen: > {code:r}> Expression$scalar("foo")$type() > Utf8 > string > > Expression$scalar(42L)$type() > Int32 > int32{code} > This is simple to solve; we just need to set {{schema}} to an empty schema > for all scalar expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-13117) [R] Retain schema in new Expressions
[ https://issues.apache.org/jira/browse/ARROW-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook resolved ARROW-13117. -- Resolution: Fixed Issue resolved by pull request 10563 [https://github.com/apache/arrow/pull/10563] > [R] Retain schema in new Expressions > > > Key: ARROW-13117 > URL: https://issues.apache.org/jira/browse/ARROW-13117 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > Fix For: 5.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When a new Expression is created, {{schema}} should be retained from the > expression(s) it was created from. That way, the {{type()}} and {{type_id()}} > methods of the new Expression will work. For example, currently this happens: > {code:r} > > x <- Expression$field_ref("x") > > x$schema <- Schema$create(x = int32()) > > > > y <- Expression$field_ref("y") > > y$schema <- Schema$create(y = int32()) > > > > Expression$create("add_checked", x, y)$type() > Error: !is.null(schema) is not TRUE {code} > This is what we want to happen: > {code:r} > > Expression$create("add_checked", x, y)$type() > Int32 > int32 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV
[ https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369707#comment-17369707 ] Weston Pace commented on ARROW-13187: - Also, it seems this does not happen when repeatedly reading in a parquet file. So maybe it isn't in the Arrow->Python code or maybe it's particular to the way the CSV reader is creating the table. > [c++][python] Possibly memory not deallocated when reading in CSV > - > > Key: ARROW-13187 > URL: https://issues.apache.org/jira/browse/ARROW-13187 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 4.0.1 >Reporter: Simon >Priority: Minor > > When one reads in a table from CSV in pyarrow version 4.0.1, it appears that > the read-in table variable is not freed (or not fast enough). I'm unsure if > this is because of pyarrow or because of the way pyarrow memory allocation > interacts with Python memory allocation. I encountered it when processing > many large CSVs sequentially. > When I run the following piece of code, the RAM memory usage increases quite > rapidly until it runs out of memory. > {code:python} > import pyarrow as pa > import pyarrow.csv > # Generate some CSV file to read in > print("Generating CSV") > with open("example.csv", "w+") as f_out: > for i in range(0, 1000): > f_out.write("123456789,abc def ghi jkl\n") > def read_in_the_csv(): > table = pa.csv.read_csv("example.csv") > print(table) # Not strictly necessary to replicate bug, table can also > be an unused variable > # This will free up the memory, as a workaround: > # table = table.slice(0, 0) > # Read in the CSV many times > print("Reading in a CSV many times") > for j in range(10): > read_in_the_csv() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV
[ https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-13187: Component/s: C++ > [c++][python] Possibly memory not deallocated when reading in CSV > - > > Key: ARROW-13187 > URL: https://issues.apache.org/jira/browse/ARROW-13187 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 4.0.1 >Reporter: Simon >Priority: Minor > > When one reads in a table from CSV in pyarrow version 4.0.1, it appears that > the read-in table variable is not freed (or not fast enough). I'm unsure if > this is because of pyarrow or because of the way pyarrow memory allocation > interacts with Python memory allocation. I encountered it when processing > many large CSVs sequentially. > When I run the following piece of code, the RAM memory usage increases quite > rapidly until it runs out of memory. > {code:python} > import pyarrow as pa > import pyarrow.csv > # Generate some CSV file to read in > print("Generating CSV") > with open("example.csv", "w+") as f_out: > for i in range(0, 1000): > f_out.write("123456789,abc def ghi jkl\n") > def read_in_the_csv(): > table = pa.csv.read_csv("example.csv") > print(table) # Not strictly necessary to replicate bug, table can also > be an unused variable > # This will free up the memory, as a workaround: > # table = table.slice(0, 0) > # Read in the CSV many times > print("Reading in a CSV many times") > for j in range(10): > read_in_the_csv() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13187) [c++][python] Possibly memory not deallocated when reading in CSV
[ https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-13187: Summary: [c++][python] Possibly memory not deallocated when reading in CSV (was: Possibly memory not deallocated when reading in CSV) > [c++][python] Possibly memory not deallocated when reading in CSV > - > > Key: ARROW-13187 > URL: https://issues.apache.org/jira/browse/ARROW-13187 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 4.0.1 >Reporter: Simon >Priority: Minor > > When one reads in a table from CSV in pyarrow version 4.0.1, it appears that > the read-in table variable is not freed (or not fast enough). I'm unsure if > this is because of pyarrow or because of the way pyarrow memory allocation > interacts with Python memory allocation. I encountered it when processing > many large CSVs sequentially. > When I run the following piece of code, the RAM memory usage increases quite > rapidly until it runs out of memory. > {code:python} > import pyarrow as pa > import pyarrow.csv > # Generate some CSV file to read in > print("Generating CSV") > with open("example.csv", "w+") as f_out: > for i in range(0, 1000): > f_out.write("123456789,abc def ghi jkl\n") > def read_in_the_csv(): > table = pa.csv.read_csv("example.csv") > print(table) # Not strictly necessary to replicate bug, table can also > be an unused variable > # This will free up the memory, as a workaround: > # table = table.slice(0, 0) > # Read in the CSV many times > print("Reading in a CSV many times") > for j in range(10): > read_in_the_csv() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13189) [R] Should we be handling row-level metadata at all?
[ https://issues.apache.org/jira/browse/ARROW-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369703#comment-17369703 ] Neal Richardson commented on ARROW-13189: - I think we should ignore row-level metadata in general, and (here lies the bigger task) provide an interface (via S3 methods, most likely) for people to define custom boxing/unboxing of custom data types where our general metadata handling is insufficient or suboptimal. This is essentially allowing R developers to define Extension Types. > [R] Should we be handling row-level metadata at all? > > > Key: ARROW-13189 > URL: https://issues.apache.org/jira/browse/ARROW-13189 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 3.0.0, 4.0.0, 4.0.1 >Reporter: Jonathan Keane >Priority: Major > > In order to support things like SF columns, we have added code that handles > row-level metadata (https://github.com/apache/arrow/pull/8549 and > https://github.com/apache/arrow/pull/9182). > These work just fine in a single table or single parquet file circumstance, > but when using a dataset (even without filtering!) this can produce some > surprising (and wrong) results (see reprex below). > There is already some work underway to make it easier to convert the > row-element-level attributes to a struct + store it in the column in the > ARROW-12542 work, but that's still a bit off. But even once that's done, > should we disable this totally? Stop or ignore+warn that with datasets > row-level metadata isn't applied (since there's no way for us to get the > ordering right)? Something else? > {code:r} > library(arrow) > df <- tibble::tibble( > part = rep(1:2, 13), > let = letters > ) > df$embedded_attr <- lapply(seq_len(nrow(df)), function(i) { > value <- "nothing" > attributes(value) <- list(letter = df[[i, "let"]]) > value > }) > df_from_tab <- as.data.frame(Table$create(df)) > # this should be (and is) "b" > attributes(df_from_tab[df_from_tab$let == "b", "embedded_attr"][[1]][[1]]) > #> $letter > #> [1] "b" > # the dfs are the same > waldo::compare(df, df_from_tab) > #> ✓ No differences > # now via dataset > dir <- "ds-dir" > write_dataset(df, path = dir, partitioning = "part") > ds <- open_dataset(dir) > df_from_ds <- dplyr::collect(ds) > # this should be (and is not) "b" > attributes(df_from_ds[df_from_ds$let == "b", "embedded_attr"][[1]][[1]]) > #> $letter > #> [1] "n" > # Even controlling for order, the dfs are not the same > waldo::compare(dplyr::arrange(df, let), dplyr::arrange(df_from_ds, let)) > #> `names(old)`: "part" "let" "embedded_attr" > #> `names(new)`:"let" "embedded_attr" "part" > #> > #> `attr(old$embedded_attr[[2]], 'letter')`: "b" > #> `attr(new$embedded_attr[[2]], 'letter')`: "n" > #> > #> `attr(old$embedded_attr[[3]], 'letter')`: "c" > #> `attr(new$embedded_attr[[3]], 'letter')`: "b" > #> > #> `attr(old$embedded_attr[[4]], 'letter')`: "d" > #> `attr(new$embedded_attr[[4]], 'letter')`: "o" > #> > #> `attr(old$embedded_attr[[5]], 'letter')`: "e" > #> `attr(new$embedded_attr[[5]], 'letter')`: "c" > #> > #> `attr(old$embedded_attr[[6]], 'letter')`: "f" > #> `attr(new$embedded_attr[[6]], 'letter')`: "p" > #> > #> `attr(old$embedded_attr[[7]], 'letter')`: "g" > #> `attr(new$embedded_attr[[7]], 'letter')`: "d" > #> > #> `attr(old$embedded_attr[[8]], 'letter')`: "h" > #> `attr(new$embedded_attr[[8]], 'letter')`: "q" > #> > #> `attr(old$embedded_attr[[9]], 'letter')`: "i" > #> `attr(new$embedded_attr[[9]], 'letter')`: "e" > #> > #> `attr(old$embedded_attr[[10]], 'letter')`: "j" > #> `attr(new$embedded_attr[[10]], 'letter')`: "r" > #> > #> And 15 more differences ... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13187) Possibly memory not deallocated when reading in CSV
[ https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369702#comment-17369702 ] Weston Pace commented on ARROW-13187: - Great reproduction, thank you. I can reproduce this on 4.0.0 but not on 3.0.0. A few observations so far: pa.total_allocated_bytes is increasing so it is not a dynamic allocator blowup issue. "del table" prevents the out-of-ram (same as the table.slice above). "gc.collect" prevents the out-of-ram Those workarounds shouldn't be necessary however. When read_in_the_csv exits the table is no longer needed, it's refcount should decrease by 1, and it should be eligible for garbage collection. Combined with the fact that this doesn't occur on 3.0.0 (both environments are using python 3.8 although 3.8.6 vs 3.8.8 but I doubt it's a python change) I think this means that a circular reference was introduced in the Arrow->Python code between 3.0.0 and 4.0.0. > Possibly memory not deallocated when reading in CSV > --- > > Key: ARROW-13187 > URL: https://issues.apache.org/jira/browse/ARROW-13187 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 4.0.1 >Reporter: Simon >Priority: Minor > > When one reads in a table from CSV in pyarrow version 4.0.1, it appears that > the read-in table variable is not freed (or not fast enough). I'm unsure if > this is because of pyarrow or because of the way pyarrow memory allocation > interacts with Python memory allocation. I encountered it when processing > many large CSVs sequentially. > When I run the following piece of code, the RAM memory usage increases quite > rapidly until it runs out of memory. > {code:python} > import pyarrow as pa > import pyarrow.csv > # Generate some CSV file to read in > print("Generating CSV") > with open("example.csv", "w+") as f_out: > for i in range(0, 1000): > f_out.write("123456789,abc def ghi jkl\n") > def read_in_the_csv(): > table = pa.csv.read_csv("example.csv") > print(table) # Not strictly necessary to replicate bug, table can also > be an unused variable > # This will free up the memory, as a workaround: > # table = table.slice(0, 0) > # Read in the CSV many times > print("Reading in a CSV many times") > for j in range(10): > read_in_the_csv() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13174) [C++][Compute] Add strftime kernel
[ https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-13174: --- Labels: timestamp (was: ) > [C++][Compute] Add strftime kernel > -- > > Key: ARROW-13174 > URL: https://issues.apache.org/jira/browse/ARROW-13174 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > Labels: timestamp > > To convert timestamps to a string representation with an arbitrary format we > require a strftime kernel (the inverse operation of the {{strptime}} kernel > we already have). > See [comments > here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13174) [C++][Compute] Add strftime kernel
[ https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369653#comment-17369653 ] Rok Mihevc commented on ARROW-13174: This should also implement TemporalStrftimeOptions with format and locale properties. > [C++][Compute] Add strftime kernel > -- > > Key: ARROW-13174 > URL: https://issues.apache.org/jira/browse/ARROW-13174 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > To convert timestamps to a string representation with an arbitrary format we > require a strftime kernel (the inverse operation of the {{strptime}} kernel > we already have). > See [comments > here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13186) [R] Implement type determination more cleanly
[ https://issues.apache.org/jira/browse/ARROW-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369622#comment-17369622 ] Ian Cook commented on ARROW-13186: -- Nice, thanks [~npr]. Yes, using {{eval_select}} across the board is ARROW-12105. I hope to get that done for 5.0.0. > [R] Implement type determination more cleanly > - > > Key: ARROW-13186 > URL: https://issues.apache.org/jira/browse/ARROW-13186 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Affects Versions: 5.0.0 >Reporter: Ian Cook >Priority: Major > > In the R package, there are several improvements in data type determination > in the 5.0.0 release. The implementation of these improvements used a kludge: > They made it possible to store a {{Schema}} in an {{Expression}} object in > the R package; when set, this {{Schema}} is retained in derivative > {{Expression}} objects. This was the most convenient way to make the > {{Schema}} available for passing it to the {{type_id()}} method, which > requires it. But this introduces a deviation of the R package's > {{Expression}} object from the C++ library's {{Expression}} object, and it > makes our type determination functions work differently than the other R > functions in {{nse_funcs}}. > The Jira issues in which these somewhat kludgy improvements were made are: > * allowing a schema to be stored in the {{Expression}} object, and > implementing type determination functions in a way that uses that schema > (ARROW-12781) > * retaining a schema in derivative {{Expression}} objects (ARROW-13117) > * setting an empty schema in scalar literal {{Expression}} objects > (ARROW-13119) > From the perspective of the R package, an ideal way to implement type > determination functions would be to call a {{type_id}} kernel through the > {{call_function}} interface, but this was rejected in ARROW-13167. Consider > other ways that we might improve this implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13173) [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally
[ https://issues.apache.org/jira/browse/ARROW-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13173: --- Labels: pull-request-available (was: ) > [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally > - > > Key: ARROW-13173 > URL: https://issues.apache.org/jira/browse/ARROW-13173 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 4.0.0, 4.0.1 >Reporter: Yibo Cai >Assignee: Weston Pace >Priority: Major > Labels: pull-request-available > Fix For: 5.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Observed one test case failure from Travis CI arm64 job. > https://travis-ci.com/github/apache/arrow/jobs/518630381#L2271 > {{TestAsyncUtil.ReadaheadFailed}} asserted at > https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_generator_test.cc#L1131 > Looks _SleepABit()_ cannot guarantee that _finished_ will be set in time, > especially on busy CI hosts where many jobs share one machine. > cc [~westonpace] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13189) [R] Should we be handling row-level metadata at all?
Jonathan Keane created ARROW-13189: -- Summary: [R] Should we be handling row-level metadata at all? Key: ARROW-13189 URL: https://issues.apache.org/jira/browse/ARROW-13189 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 4.0.1, 4.0.0, 3.0.0 Reporter: Jonathan Keane In order to support things like SF columns, we have added code that handles row-level metadata (https://github.com/apache/arrow/pull/8549 and https://github.com/apache/arrow/pull/9182). These work just fine in a single table or single parquet file circumstance, but when using a dataset (even without filtering!) this can produce some surprising (and wrong) results (see reprex below). There is already some work underway to make it easier to convert the row-element-level attributes to a struct + store it in the column in the ARROW-12542 work, but that's still a bit off. But even once that's done, should we disable this totally? Stop or ignore+warn that with datasets row-level metadata isn't applied (since there's no way for us to get the ordering right)? Something else? {code:r} library(arrow) df <- tibble::tibble( part = rep(1:2, 13), let = letters ) df$embedded_attr <- lapply(seq_len(nrow(df)), function(i) { value <- "nothing" attributes(value) <- list(letter = df[[i, "let"]]) value }) df_from_tab <- as.data.frame(Table$create(df)) # this should be (and is) "b" attributes(df_from_tab[df_from_tab$let == "b", "embedded_attr"][[1]][[1]]) #> $letter #> [1] "b" # the dfs are the same waldo::compare(df, df_from_tab) #> ✓ No differences # now via dataset dir <- "ds-dir" write_dataset(df, path = dir, partitioning = "part") ds <- open_dataset(dir) df_from_ds <- dplyr::collect(ds) # this should be (and is not) "b" attributes(df_from_ds[df_from_ds$let == "b", "embedded_attr"][[1]][[1]]) #> $letter #> [1] "n" # Even controlling for order, the dfs are not the same waldo::compare(dplyr::arrange(df, let), dplyr::arrange(df_from_ds, let)) #> `names(old)`: "part" "let" "embedded_attr" #> `names(new)`:"let" "embedded_attr" "part" #> #> `attr(old$embedded_attr[[2]], 'letter')`: "b" #> `attr(new$embedded_attr[[2]], 'letter')`: "n" #> #> `attr(old$embedded_attr[[3]], 'letter')`: "c" #> `attr(new$embedded_attr[[3]], 'letter')`: "b" #> #> `attr(old$embedded_attr[[4]], 'letter')`: "d" #> `attr(new$embedded_attr[[4]], 'letter')`: "o" #> #> `attr(old$embedded_attr[[5]], 'letter')`: "e" #> `attr(new$embedded_attr[[5]], 'letter')`: "c" #> #> `attr(old$embedded_attr[[6]], 'letter')`: "f" #> `attr(new$embedded_attr[[6]], 'letter')`: "p" #> #> `attr(old$embedded_attr[[7]], 'letter')`: "g" #> `attr(new$embedded_attr[[7]], 'letter')`: "d" #> #> `attr(old$embedded_attr[[8]], 'letter')`: "h" #> `attr(new$embedded_attr[[8]], 'letter')`: "q" #> #> `attr(old$embedded_attr[[9]], 'letter')`: "i" #> `attr(new$embedded_attr[[9]], 'letter')`: "e" #> #> `attr(old$embedded_attr[[10]], 'letter')`: "j" #> `attr(new$embedded_attr[[10]], 'letter')`: "r" #> #> And 15 more differences ... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries
[ https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369614#comment-17369614 ] Mauricio 'Pachá' Vargas Sepúlveda commented on ARROW-13188: --- right, closing now > [R] [C++] Implement substr/str_sub for dplyr queries > > > Key: ARROW-13188 > URL: https://issues.apache.org/jira/browse/ARROW-13188 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 4.0.1 >Reporter: Mauricio 'Pachá' Vargas Sepúlveda >Priority: Minor > > I would be highly desirable to be able to use (base) substr and/or (stringr) > str_sub in dplyr queries, like > {code:r} > library(arrow) > library(dplyr) > library(stringr) > # get animal products, year 20919 > open_dataset( > "../cepii-datasets-arrow/parquet/baci_hs92", > partitioning = c("year", "reporter_iso") > ) %>% > filter( > year == 2019, > str_sub(product_code, 1, 2) == "01" > ) %>% > collect() > Error: Filter expression not supported for Arrow Datasets: > str_sub(product_code, 1, 2) == "01" > Call collect() first to pull data into R. > {code} > Of course, this needs implementation, but similar to ARROW-13107, points to > an easier integration with dplyr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12992) [R] bindings for substr
[ https://issues.apache.org/jira/browse/ARROW-12992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mauricio 'Pachá' Vargas Sepúlveda updated ARROW-12992: -- Description: Followup to ARROW-10557, which implemented the C++ current state: {code:r} library(arrow) library(dplyr) library(stringr) # get animal products, year 20919 open_dataset( "../cepii-datasets-arrow/parquet/baci_hs92", partitioning = c("year", "reporter_iso") ) %>% filter( year == 2019, str_sub(product_code, 1, 2) == "01" ) %>% collect() Error: Filter expression not supported for Arrow Datasets: str_sub(product_code, 1, 2) == "01" Call collect() first to pull data into R. {code} was:Followup to ARROW-10557, which implemented the C++ > [R] bindings for substr > --- > > Key: ARROW-12992 > URL: https://issues.apache.org/jira/browse/ARROW-12992 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Neal Richardson >Priority: Major > Fix For: 5.0.0 > > > Followup to ARROW-10557, which implemented the C++ > current state: > {code:r} > library(arrow) > library(dplyr) > library(stringr) > # get animal products, year 20919 > open_dataset( > "../cepii-datasets-arrow/parquet/baci_hs92", > partitioning = c("year", "reporter_iso") > ) %>% > filter( > year == 2019, > str_sub(product_code, 1, 2) == "01" > ) %>% > collect() > Error: Filter expression not supported for Arrow Datasets: > str_sub(product_code, 1, 2) == "01" > Call collect() first to pull data into R. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries
[ https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mauricio 'Pachá' Vargas Sepúlveda closed ARROW-13188. - Resolution: Duplicate > [R] [C++] Implement substr/str_sub for dplyr queries > > > Key: ARROW-13188 > URL: https://issues.apache.org/jira/browse/ARROW-13188 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 4.0.1 >Reporter: Mauricio 'Pachá' Vargas Sepúlveda >Priority: Minor > > I would be highly desirable to be able to use (base) substr and/or (stringr) > str_sub in dplyr queries, like > {code:r} > library(arrow) > library(dplyr) > library(stringr) > # get animal products, year 20919 > open_dataset( > "../cepii-datasets-arrow/parquet/baci_hs92", > partitioning = c("year", "reporter_iso") > ) %>% > filter( > year == 2019, > str_sub(product_code, 1, 2) == "01" > ) %>% > collect() > Error: Filter expression not supported for Arrow Datasets: > str_sub(product_code, 1, 2) == "01" > Call collect() first to pull data into R. > {code} > Of course, this needs implementation, but similar to ARROW-13107, points to > an easier integration with dplyr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries
[ https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369607#comment-17369607 ] Ian Cook commented on ARROW-13188: -- Dup of ARROW-12992? > [R] [C++] Implement substr/str_sub for dplyr queries > > > Key: ARROW-13188 > URL: https://issues.apache.org/jira/browse/ARROW-13188 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 4.0.1 >Reporter: Mauricio 'Pachá' Vargas Sepúlveda >Priority: Minor > > I would be highly desirable to be able to use (base) substr and/or (stringr) > str_sub in dplyr queries, like > {code:r} > library(arrow) > library(dplyr) > library(stringr) > # get animal products, year 20919 > open_dataset( > "../cepii-datasets-arrow/parquet/baci_hs92", > partitioning = c("year", "reporter_iso") > ) %>% > filter( > year == 2019, > str_sub(product_code, 1, 2) == "01" > ) %>% > collect() > Error: Filter expression not supported for Arrow Datasets: > str_sub(product_code, 1, 2) == "01" > Call collect() first to pull data into R. > {code} > Of course, this needs implementation, but similar to ARROW-13107, points to > an easier integration with dplyr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13186) [R] Implement type determination more cleanly
[ https://issues.apache.org/jira/browse/ARROW-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369591#comment-17369591 ] Neal Richardson commented on ARROW-13186: - I did some experimenting and got something that works for the arrow_mask/arrow_eval code paths, but any paths that use tidyselect::eval_select (currently only relocate but presumably others will be added) need slightly different handling and I didn't get the chance to work out a solution there yet. The idea is that we stick the schema as a "data pronoun" like thing in the data mask, so that any functions called inside arrow_eval() can call up and find it. {code} diff --git a/r/R/dplyr-eval.R b/r/R/dplyr-eval.R index de68d2f2c..eda40dc23 100644 --- a/r/R/dplyr-eval.R +++ b/r/R/dplyr-eval.R @@ -86,9 +86,6 @@ arrow_mask <- function(.data) { f_env[[f]] <- fail } - # Assign the schema to the expressions - map(.data$selected_columns, ~(.$schema <- .data$.data$schema)) - # Add the column references and make the mask out <- new_data_mask( new_environment(.data$selected_columns, parent = f_env), @@ -98,5 +95,18 @@ arrow_mask <- function(.data) { # TODO: figure out what rlang::as_data_pronoun does/why we should use it # (because if we do we get `Error: Can't modify the data pronoun` in mutate()) out$.data <- .data$selected_columns + out$.schema <- .data$.data$schema out } + +arrow_eval_schema <- function() { + n <- 1 + env <- parent.frame(n) + while(!identical(env, .GlobalEnv)) { +if (".schema" %in% ls(env, all.names = TRUE)) { + return(get(".schema", env)) +} +n <- n + 1 +env <- parent.frame(n) + } +} {code} Then each of the is* functions calls arrow_eval_schema() to get it. The benefit of something like this is that we avoid the cost of tracking/merging schemas when building expressions and only have to grab it when we need it (which is rarely since none of the other compute kernels require it). > [R] Implement type determination more cleanly > - > > Key: ARROW-13186 > URL: https://issues.apache.org/jira/browse/ARROW-13186 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Affects Versions: 5.0.0 >Reporter: Ian Cook >Priority: Major > > In the R package, there are several improvements in data type determination > in the 5.0.0 release. The implementation of these improvements used a kludge: > They made it possible to store a {{Schema}} in an {{Expression}} object in > the R package; when set, this {{Schema}} is retained in derivative > {{Expression}} objects. This was the most convenient way to make the > {{Schema}} available for passing it to the {{type_id()}} method, which > requires it. But this introduces a deviation of the R package's > {{Expression}} object from the C++ library's {{Expression}} object, and it > makes our type determination functions work differently than the other R > functions in {{nse_funcs}}. > The Jira issues in which these somewhat kludgy improvements were made are: > * allowing a schema to be stored in the {{Expression}} object, and > implementing type determination functions in a way that uses that schema > (ARROW-12781) > * retaining a schema in derivative {{Expression}} objects (ARROW-13117) > * setting an empty schema in scalar literal {{Expression}} objects > (ARROW-13119) > From the perspective of the R package, an ideal way to implement type > determination functions would be to call a {{type_id}} kernel through the > {{call_function}} interface, but this was rejected in ARROW-13167. Consider > other ways that we might improve this implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries
[ https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mauricio 'Pachá' Vargas Sepúlveda updated ARROW-13188: -- Summary: [R] [C++] Implement substr/str_sub for dplyr queries (was: [R] [C++] Implement SQL-alike distinct() for dplyr queries) > [R] [C++] Implement substr/str_sub for dplyr queries > > > Key: ARROW-13188 > URL: https://issues.apache.org/jira/browse/ARROW-13188 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Affects Versions: 4.0.1 >Reporter: Mauricio 'Pachá' Vargas Sepúlveda >Priority: Minor > > I would be highly desirable to be able to use (base) substr and/or (stringr) > str_sub in dplyr queries, like > {code:r} > library(arrow) > library(dplyr) > library(stringr) > # get animal products, year 20919 > open_dataset( > "../cepii-datasets-arrow/parquet/baci_hs92", > partitioning = c("year", "reporter_iso") > ) %>% > filter( > year == 2019, > str_sub(product_code, 1, 2) == "01" > ) %>% > collect() > Error: Filter expression not supported for Arrow Datasets: > str_sub(product_code, 1, 2) == "01" > Call collect() first to pull data into R. > {code} > Of course, this needs implementation, but similar to ARROW-13107, points to > an easier integration with dplyr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13188) [R] [C++] Implement SQL-alike distinct() for dplyr queries
Mauricio 'Pachá' Vargas Sepúlveda created ARROW-13188: - Summary: [R] [C++] Implement SQL-alike distinct() for dplyr queries Key: ARROW-13188 URL: https://issues.apache.org/jira/browse/ARROW-13188 Project: Apache Arrow Issue Type: Bug Components: C++, R Affects Versions: 4.0.1 Reporter: Mauricio 'Pachá' Vargas Sepúlveda I would be highly desirable to be able to use (base) substr and/or (stringr) str_sub in dplyr queries, like {code:r} library(arrow) library(dplyr) library(stringr) # get animal products, year 20919 open_dataset( "../cepii-datasets-arrow/parquet/baci_hs92", partitioning = c("year", "reporter_iso") ) %>% filter( year == 2019, str_sub(product_code, 1, 2) == "01" ) %>% collect() Error: Filter expression not supported for Arrow Datasets: str_sub(product_code, 1, 2) == "01" Call collect() first to pull data into R. {code} Of course, this needs implementation, but similar to ARROW-13107, points to an easier integration with dplyr. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-13173) [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally
[ https://issues.apache.org/jira/browse/ARROW-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace reassigned ARROW-13173: --- Assignee: Weston Pace > [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally > - > > Key: ARROW-13173 > URL: https://issues.apache.org/jira/browse/ARROW-13173 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 4.0.0, 4.0.1 >Reporter: Yibo Cai >Assignee: Weston Pace >Priority: Major > Fix For: 5.0.0 > > > Observed one test case failure from Travis CI arm64 job. > https://travis-ci.com/github/apache/arrow/jobs/518630381#L2271 > {{TestAsyncUtil.ReadaheadFailed}} asserted at > https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_generator_test.cc#L1131 > Looks _SleepABit()_ cannot guarantee that _finished_ will be set in time, > especially on busy CI hosts where many jobs share one machine. > cc [~westonpace] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13187) Possibly memory not deallocated when reading in CSV
[ https://issues.apache.org/jira/browse/ARROW-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon updated ARROW-13187: -- Description: When one reads in a table from CSV in pyarrow version 4.0.1, it appears that the read-in table variable is not freed (or not fast enough). I'm unsure if this is because of pyarrow or because of the way pyarrow memory allocation interacts with Python memory allocation. I encountered it when processing many large CSVs sequentially. When I run the following piece of code, the RAM memory usage increases quite rapidly until it runs out of memory. {code:python} import pyarrow as pa import pyarrow.csv # Generate some CSV file to read in print("Generating CSV") with open("example.csv", "w+") as f_out: for i in range(0, 1000): f_out.write("123456789,abc def ghi jkl\n") def read_in_the_csv(): table = pa.csv.read_csv("example.csv") print(table) # Not strictly necessary to replicate bug, table can also be an unused variable # This will free up the memory, as a workaround: # table = table.slice(0, 0) # Read in the CSV many times print("Reading in a CSV many times") for j in range(10): read_in_the_csv() {code} was: When one reads in a table from CSV in pyarrow version 4.0.1, it appears that the read-in table variable is not freed (or not fast enough). I'm unsure if this is because of pyarrow or because of the way pyarrow memory allocation interacts with Python memory allocation. I encountered it when processing many large CSVs sequentially. When I run the following piece of code, the RAM memory usage increases quite rapidly until it runs out of memory. {code:python} import pyarrow as pa import pyarrow.csv # Generate some CSV file to read in print("Generating CSV") with open("example.csv", "w+") as f_out: for i in range(0, 1000): f_out.write("123456789,abc def ghi jkl\n") def read_in_the_csv(): table = pa.csv.read_csv("example.csv") print(table) # Not strictly necessary to replicate bug, table can also be an unused variable # This will free up the memory, as a workaround: # table = table.slice(0, 0) # Read in the print("Reading in a CSV many times") for j in range(10): read_in_the_csv() {code} > Possibly memory not deallocated when reading in CSV > --- > > Key: ARROW-13187 > URL: https://issues.apache.org/jira/browse/ARROW-13187 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 4.0.1 >Reporter: Simon >Priority: Minor > > When one reads in a table from CSV in pyarrow version 4.0.1, it appears that > the read-in table variable is not freed (or not fast enough). I'm unsure if > this is because of pyarrow or because of the way pyarrow memory allocation > interacts with Python memory allocation. I encountered it when processing > many large CSVs sequentially. > When I run the following piece of code, the RAM memory usage increases quite > rapidly until it runs out of memory. > {code:python} > import pyarrow as pa > import pyarrow.csv > # Generate some CSV file to read in > print("Generating CSV") > with open("example.csv", "w+") as f_out: > for i in range(0, 1000): > f_out.write("123456789,abc def ghi jkl\n") > def read_in_the_csv(): > table = pa.csv.read_csv("example.csv") > print(table) # Not strictly necessary to replicate bug, table can also > be an unused variable > # This will free up the memory, as a workaround: > # table = table.slice(0, 0) > # Read in the CSV many times > print("Reading in a CSV many times") > for j in range(10): > read_in_the_csv() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13187) Possibly memory not deallocated when reading in CSV
Simon created ARROW-13187: - Summary: Possibly memory not deallocated when reading in CSV Key: ARROW-13187 URL: https://issues.apache.org/jira/browse/ARROW-13187 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 4.0.1 Reporter: Simon When one reads in a table from CSV in pyarrow version 4.0.1, it appears that the read-in table variable is not freed (or not fast enough). I'm unsure if this is because of pyarrow or because of the way pyarrow memory allocation interacts with Python memory allocation. I encountered it when processing many large CSVs sequentially. When I run the following piece of code, the RAM memory usage increases quite rapidly until it runs out of memory. {code:python} import pyarrow as pa import pyarrow.csv # Generate some CSV file to read in print("Generating CSV") with open("example.csv", "w+") as f_out: for i in range(0, 1000): f_out.write("123456789,abc def ghi jkl\n") def read_in_the_csv(): table = pa.csv.read_csv("example.csv") print(table) # Not strictly necessary to replicate bug, table can also be an unused variable # This will free up the memory, as a workaround: # table = table.slice(0, 0) # Read in the print("Reading in a CSV many times") for j in range(10): read_in_the_csv() {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12137) [R] New/improved vignette on dplyr features
[ https://issues.apache.org/jira/browse/ARROW-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated ARROW-12137: - Fix Version/s: (was: 5.0.0) 6.0.0 > [R] New/improved vignette on dplyr features > --- > > Key: ARROW-12137 > URL: https://issues.apache.org/jira/browse/ARROW-12137 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Neal Richardson >Assignee: Ian Cook >Priority: Major > Fix For: 6.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13151) [Python] Unable to read single child field of struct column from Parquet
[ https://issues.apache.org/jira/browse/ARROW-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369582#comment-17369582 ] Jim Pivarski commented on ARROW-13151: -- Great, thank you! I see now that your calling it a "bug" was commenting on Joris's question about whether it ought to be supported, and that's what I was responding to. When this is fixed, it will be a new minimum version of Arrow for us because of its importance in our work. (As a side-note, if you do change the ugly "list.item" access, we'll have to adjust because of course we're generating column names to request them like that. So if that changes, we'll definitely need to pin a minimum Arrow version because the new names would be incompatible. I'd prefer it not to; and after all, it's what's in the Parquet schema. Maybe "synonyms" could hide that feature from high-level users, though that complicates the interface.) > [Python] Unable to read single child field of struct column from Parquet > > > Key: ARROW-13151 > URL: https://issues.apache.org/jira/browse/ARROW-13151 > Project: Apache Arrow > Issue Type: Bug > Components: Parquet, Python >Reporter: Angus Hollands >Priority: Major > > Given the following table > {code:java} > data = {"root": [[{"addr": {"this": 3, "that": 3}}]]} > table = pa.Table.from_pydict(data) > {code} > reading the nested column leads to an `pyarrow.lib.ArrowInvalid` error: > {code} > pq.write_table(table, "/tmp/table.parquet") > file = pq.ParquetFile("/tmp/table.parquet") > array = file.read(["root.list.item.addr.that"]) > {code} > Traceback: > {code} > Traceback (most recent call last): > File "", line 21, in > array = file.read(["root.list.item.addr.that"]) > File > "/home/angus/.mambaforge/envs/awkward/lib/python3.9/site-packages/pyarrow/parquet.py", > line 383, in read > return self.reader.read_all(column_indices=column_indices, > File "pyarrow/_parquet.pyx", line 1097, in > pyarrow._parquet.ParquetReader.read_all > File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: List child array invalid: Invalid: Struct child > array #0 does not match type field: struct vs struct int64, this: int64> > {code} > It's possible that I don't quite understand this properly - am I doing > something wrong? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10998) [C++] Filesystems: detect if URI is passed where a file path is required and raise informative error
[ https://issues.apache.org/jira/browse/ARROW-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated ARROW-10998: - Fix Version/s: (was: 5.0.0) 6.0.0 > [C++] Filesystems: detect if URI is passed where a file path is required and > raise informative error > > > Key: ARROW-10998 > URL: https://issues.apache.org/jira/browse/ARROW-10998 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Joris Van den Bossche >Assignee: Ian Cook >Priority: Major > Labels: filesystem > Fix For: 6.0.0 > > > Currently, when passing a URI to a filesystem method (except for > {{from_uri}}) or other functions that accept a filesystem object, you can get > a rather cryptic error message (eg in this case about "No response body" for > S3, in the example below). > Ideally, the filesystem object knows its own prefix "scheme", and so can > detect if a user is passing a URI instead of file path, and we can provide a > nicer error message. > Example with S3: > {code:python} > >>> from pyarrow.fs import S3FileSystem > >>> fs = S3FileSystem(region="us-east-2") > >>> fs.get_file_info('s3://ursa-labs-taxi-data/2016/01/') > ... > OSError: When getting information for key '/ursa-labs-taxi-data/2016/01' in > bucket 's3:': AWS Error [code 100]: No response body. > >>> import pyarrow.parquet as pq > >>> table = pq.read_table('s3://ursa-labs-taxi-data/2016/01/data.parquet', > >>> filesystem=fs) > ... > OSError: When getting information for key > '/ursa-labs-taxi-data/2016/01/data.parquet' in bucket 's3:': AWS Error [code > 100]: No response body. > {code} > With a local filesystem, you actually get a not found file: > {code: python} > >>> fs = LocalFileSystem() > >>> fs.get_file_info("file:///home") > > {code} > cc [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369567#comment-17369567 ] Gert Hulselmans commented on ARROW-10344: - Combined the above snippets in a cleaner way: https://github.com/aertslab/create_cisTarget_databases/commit/dcf70e60e915d2dc6850343960e7a7d3d3d56c41 > [Python] Get all columns names (or schema) from Feather file, before loading > whole Feather file > > > Key: ARROW-10344 > URL: https://issues.apache.org/jira/browse/ARROW-10344 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Affects Versions: 1.0.1 >Reporter: Gert Hulselmans >Priority: Major > > Is there a way to get all column names (or schema) from a Feather file before > loading the full Feather file? > My Feather files are big (like 100GB) and the names of the columns are > different per analysis and can't be hard coded. > {code:python} > import pyarrow.feather as feather > # Code here to check which columns are in the feather file. > ... > my_columns = ... > # Result is pandas.DataFrame > read_df = feather.read_feather('/path/to/file', columns=my_columns) > # Result is pyarrow.Table > read_arrow = feather.read_table('/path/to/file', columns=my_columns) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-12904) [Rust] Unable to load Feather v2 files created by pyarrow and pandas.
[ https://issues.apache.org/jira/browse/ARROW-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gert Hulselmans closed ARROW-12904. --- Resolution: Information Provided > [Rust] Unable to load Feather v2 files created by pyarrow and pandas. > - > > Key: ARROW-12904 > URL: https://issues.apache.org/jira/browse/ARROW-12904 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 4.0.1 > Environment: Ubuntu 20.04 >Reporter: Gert Hulselmans >Assignee: Joris Van den Bossche >Priority: Major > > arrow-rs seems unable to load Feather v2 files created by pyarrow (and > pandas), while it can read Feather v2 created by itself. > More info at: > [https://github.com/apache/arrow-rs/issues/286] > > Any idea what is missing in the Rust implementation (missing part of the > spec?)? > > {code:java} > More details: in both files, I am getting the following: > Reading Utf8 > field_node: FieldNode { length: 7, null_count: 0 } > offset buffer: Buffer { offset: 200, length: 55 } > offsets: [32, 0, 407708164, 545407072, 8388608, 67108864, 134217728, > 201326592] > values buffer: Buffer { offset: 256, length: 51 } > offsets[0] != 0 indicates a problem: offsets are expected to start from zero > on any array with offsets. > offsets[i+1] < offsets[i+1] for some i, which indicates a problem: offsets > are expected to be monotonically increasing > I do not have a root cause yet, these are just observations. > {code} > https://github.com/apache/arrow-rs/issues/286#issuecomment-839524898 > > In the attachment the following files can be found. > {{}} > {code:java} > test_pandas.feather: Original Feather file > test_arrow.feather: loading test_pandas.feather with pyarrow and saving with > pyarrow: df_pa = pa.feather.read_feather('test_pandas.feather') > test_polars.feather: Loading test_pandas.feather with pyarrow and saving > with polars (only this one can be read by arrow-rs) > test_pandas_from_polars.feather: Loading test_polars.feather with polars and > using the to_pandas option. > > > {code} > > [^test_feather_file.zip] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12904) [Rust] Unable to load Feather v2 files created by pyarrow and pandas.
[ https://issues.apache.org/jira/browse/ARROW-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369563#comment-17369563 ] Gert Hulselmans commented on ARROW-12904: - Looks like it was caused by lz4 compression used in Feather file that arrow-rs does not detect properly. > [Rust] Unable to load Feather v2 files created by pyarrow and pandas. > - > > Key: ARROW-12904 > URL: https://issues.apache.org/jira/browse/ARROW-12904 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 4.0.1 > Environment: Ubuntu 20.04 >Reporter: Gert Hulselmans >Assignee: Joris Van den Bossche >Priority: Major > > arrow-rs seems unable to load Feather v2 files created by pyarrow (and > pandas), while it can read Feather v2 created by itself. > More info at: > [https://github.com/apache/arrow-rs/issues/286] > > Any idea what is missing in the Rust implementation (missing part of the > spec?)? > > {code:java} > More details: in both files, I am getting the following: > Reading Utf8 > field_node: FieldNode { length: 7, null_count: 0 } > offset buffer: Buffer { offset: 200, length: 55 } > offsets: [32, 0, 407708164, 545407072, 8388608, 67108864, 134217728, > 201326592] > values buffer: Buffer { offset: 256, length: 51 } > offsets[0] != 0 indicates a problem: offsets are expected to start from zero > on any array with offsets. > offsets[i+1] < offsets[i+1] for some i, which indicates a problem: offsets > are expected to be monotonically increasing > I do not have a root cause yet, these are just observations. > {code} > https://github.com/apache/arrow-rs/issues/286#issuecomment-839524898 > > In the attachment the following files can be found. > {{}} > {code:java} > test_pandas.feather: Original Feather file > test_arrow.feather: loading test_pandas.feather with pyarrow and saving with > pyarrow: df_pa = pa.feather.read_feather('test_pandas.feather') > test_polars.feather: Loading test_pandas.feather with pyarrow and saving > with polars (only this one can be read by arrow-rs) > test_pandas_from_polars.feather: Loading test_polars.feather with polars and > using the to_pandas option. > > > {code} > > [^test_feather_file.zip] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13151) [Python] Unable to read single child field of struct column from Parquet
[ https://issues.apache.org/jira/browse/ARROW-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369562#comment-17369562 ] Micah Kornfield commented on ARROW-13151: - It should be very much supported. Like I said this is a bug. It will take some tracing to figure out why it is occurring. > [Python] Unable to read single child field of struct column from Parquet > > > Key: ARROW-13151 > URL: https://issues.apache.org/jira/browse/ARROW-13151 > Project: Apache Arrow > Issue Type: Bug > Components: Parquet, Python >Reporter: Angus Hollands >Priority: Major > > Given the following table > {code:java} > data = {"root": [[{"addr": {"this": 3, "that": 3}}]]} > table = pa.Table.from_pydict(data) > {code} > reading the nested column leads to an `pyarrow.lib.ArrowInvalid` error: > {code} > pq.write_table(table, "/tmp/table.parquet") > file = pq.ParquetFile("/tmp/table.parquet") > array = file.read(["root.list.item.addr.that"]) > {code} > Traceback: > {code} > Traceback (most recent call last): > File "", line 21, in > array = file.read(["root.list.item.addr.that"]) > File > "/home/angus/.mambaforge/envs/awkward/lib/python3.9/site-packages/pyarrow/parquet.py", > line 383, in read > return self.reader.read_all(column_indices=column_indices, > File "pyarrow/_parquet.pyx", line 1097, in > pyarrow._parquet.ParquetReader.read_all > File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: List child array invalid: Invalid: Struct child > array #0 does not match type field: struct vs struct int64, this: int64> > {code} > It's possible that I don't quite understand this properly - am I doing > something wrong? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13117) [R] Retain schema in new Expressions
[ https://issues.apache.org/jira/browse/ARROW-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369552#comment-17369552 ] Ian Cook commented on ARROW-13117: -- In recognition of the following... * Enabling the data type of an {{Expression}} to be knowable at all times is important for enabling broader support for expressions in dplyr verbs. * The PR here and the the earlier changes in ARROW-12781 enable that, but in a somewhat kludgy. * As kludges go, this one is not so bad, and it and would be straightforward to implement this more cleanly in the future. * At present, there is no clear way to implement this more cleanly, at least not without doing a major refactor or compromising its functionality. ... I created ARROW-13186 for future consideration of ways to implement this more cleanly, and for now I will merge this PR. > [R] Retain schema in new Expressions > > > Key: ARROW-13117 > URL: https://issues.apache.org/jira/browse/ARROW-13117 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > Fix For: 5.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When a new Expression is created, {{schema}} should be retained from the > expression(s) it was created from. That way, the {{type()}} and {{type_id()}} > methods of the new Expression will work. For example, currently this happens: > {code:r} > > x <- Expression$field_ref("x") > > x$schema <- Schema$create(x = int32()) > > > > y <- Expression$field_ref("y") > > y$schema <- Schema$create(y = int32()) > > > > Expression$create("add_checked", x, y)$type() > Error: !is.null(schema) is not TRUE {code} > This is what we want to happen: > {code:r} > > Expression$create("add_checked", x, y)$type() > Int32 > int32 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13186) [R] Implement type determination more cleanly
[ https://issues.apache.org/jira/browse/ARROW-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated ARROW-13186: - Description: In the R package, there are several improvements in data type determination in the 5.0.0 release. The implementation of these improvements used a kludge: They made it possible to store a {{Schema}} in an {{Expression}} object in the R package; when set, this {{Schema}} is retained in derivative {{Expression}} objects. This was the most convenient way to make the {{Schema}} available for passing it to the {{type_id()}} method, which requires it. But this introduces a deviation of the R package's {{Expression}} object from the C++ library's {{Expression}} object, and it makes our type determination functions work differently than the other R functions in {{nse_funcs}}. The Jira issues in which these somewhat kludgy improvements were made are: * allowing a schema to be stored in the {{Expression}} object, and implementing type determination functions in a way that uses that schema (ARROW-12781) * retaining a schema in derivative {{Expression}} objects (ARROW-13117) * setting an empty schema in scalar literal {{Expression}} objects (ARROW-13119) >From the perspective of the R package, an ideal way to implement type >determination functions would be to call a {{type_id}} kernel through the >{{call_function}} interface, but this was rejected in ARROW-13167. Consider >other ways that we might improve this implementation. was: In the R package, there are several improvements in data type determination in the 5.0.0 release. The implementation of these improvements used a kludge: They made it possible to store a {{Schema}} in an {{Expression}} object in the R package; when set, this {{Schema}} is retained in derivative {{Expression}}s. This was the most convenient way to make the {{Schema}} available for passing it to the {{type_id()}} method, which requires it. But this introduces a deviation of the R package's {{Expression}} object from the C++ library's {{Expression}} object, and it makes our type determination functions work differently than the other R functions in {{nse_funcs}}. The Jira issues in which these somewhat kludgy improvements were made are: * allowing a schema to be stored in the {{Expression}} object, and implementing type determination functions in a way that uses that schema (ARROW-12781) * retaining a schema in derivative {{Expression}} objects (ARROW-13117) * setting an empty schema in scalar literal {{Expression}} objects (ARROW-13119) >From the perspective of the R package, an ideal way to implement type >determination functions would be to call a {{type_id}} kernel through the >{{call_function}} interface, but this was rejected in ARROW-13167. Consider >other ways that we might improve this implementation. > [R] Implement type determination more cleanly > - > > Key: ARROW-13186 > URL: https://issues.apache.org/jira/browse/ARROW-13186 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Affects Versions: 5.0.0 >Reporter: Ian Cook >Priority: Major > > In the R package, there are several improvements in data type determination > in the 5.0.0 release. The implementation of these improvements used a kludge: > They made it possible to store a {{Schema}} in an {{Expression}} object in > the R package; when set, this {{Schema}} is retained in derivative > {{Expression}} objects. This was the most convenient way to make the > {{Schema}} available for passing it to the {{type_id()}} method, which > requires it. But this introduces a deviation of the R package's > {{Expression}} object from the C++ library's {{Expression}} object, and it > makes our type determination functions work differently than the other R > functions in {{nse_funcs}}. > The Jira issues in which these somewhat kludgy improvements were made are: > * allowing a schema to be stored in the {{Expression}} object, and > implementing type determination functions in a way that uses that schema > (ARROW-12781) > * retaining a schema in derivative {{Expression}} objects (ARROW-13117) > * setting an empty schema in scalar literal {{Expression}} objects > (ARROW-13119) > From the perspective of the R package, an ideal way to implement type > determination functions would be to call a {{type_id}} kernel through the > {{call_function}} interface, but this was rejected in ARROW-13167. Consider > other ways that we might improve this implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13186) [R] Implement type determination more cleanly
Ian Cook created ARROW-13186: Summary: [R] Implement type determination more cleanly Key: ARROW-13186 URL: https://issues.apache.org/jira/browse/ARROW-13186 Project: Apache Arrow Issue Type: Improvement Components: R Affects Versions: 5.0.0 Reporter: Ian Cook In the R package, there are several improvements in data type determination in the 5.0.0 release. The implementation of these improvements used a kludge: They made it possible to store a {{Schema}} in an {{Expression}} object in the R package; when set, this {{Schema}} is retained in derivative {{Expression}}s. This was the most convenient way to make the {{Schema}} available for passing it to the {{type_id()}} method, which requires it. But this introduces a deviation of the R package's {{Expression}} object from the C++ library's {{Expression}} object, and it makes our type determination functions work differently than the other R functions in {{nse_funcs}}. The Jira issues in which these somewhat kludgy improvements were made are: * allowing a schema to be stored in the {{Expression}} object, and implementing type determination functions in a way that uses that schema (ARROW-12781) * retaining a schema in derivative {{Expression}} objects (ARROW-13117) * setting an empty schema in scalar literal {{Expression}} objects (ARROW-13119) >From the perspective of the R package, an ideal way to implement type >determination functions would be to call a {{type_id}} kernel through the >{{call_function}} interface, but this was rejected in ARROW-13167. Consider >other ways that we might improve this implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13151) [Python] Unable to read single child field of struct column from Parquet
[ https://issues.apache.org/jira/browse/ARROW-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369517#comment-17369517 ] Jim Pivarski commented on ARROW-13151: -- I hope reading a single field of a struct column is supported! It's an important use-case for us. In particle physics, our data consist of many collision events, each with a variable-length number of particles, and each particle is a struct with many fields. Often, there's even deeper structure than that, but this is the basic structure. These structs are very wide, with as many as a hundred fields, because the same dataset is used by 3000 authors, all doing different analyses on the same input dataset. Most individual data analysts don't access more than 10% of these struct fields. Therefore, it's important to be able to read the data lazily (in interactive analysis) or at least selectively (in high-throughput applications). Reading and decompressing data are often bottlenecks, so restricting data-loading to just the data we use is by itself a 10× improvement. We have a custom file format (ROOT) that is designed to provide exactly this selective reading, but we've been looking at Parquet as a more cross-language and non-domain-specific alternative. The bug that Angus reported arose in a framework that provides lazy-reading, Awkward Array's [ak.from_parquet|https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_parquet.html] function, which uses pyarrow.parquet.ParquetFile to read the data and convert it to Arrow, then converts the Arrow into Awkward Arrays (which are highly interchangeable with Arrow Arrays; conversion in both directions is usually zero-copy). [This whole feature|https://github.com/scikit-hep/awkward-1.0/blob/1ecfc3e29aaf1b79cd7e0e8fa1598452f3827c64/src/awkward/operations/convert.py#L3122-L3959] was designed around the idea that you can read individual struct fields, just as you can read individual columns. Just today, I found out that's not true, even in our basic case that does not trigger errors like Angus's: >>> pq.write_table(pa.Table.from_pydict(\{"events": [{"muons": [{"pt": 10.5, >>> "eta": -1.5, "phi": 0.1}]}]}), "/tmp/testy.parquet") >>> pq.ParquetFile("/tmp/testy.parquet").read(["events.muons.list.item.pt"]) >>> # reads all three pyarrow.Table events: struct>> child 0, muons: list> child 0, item: struct child 0, eta: double child 1, phi: double child 2, pt: double >>> pq.ParquetFile("/tmp/testy.parquet").read(["events.muons.list.item.eta"]) >>> # reads all three pyarrow.Table events: struct>> child 0, muons: list> child 0, item: struct child 0, eta: double child 1, phi: double child 2, pt: double >>> pq.ParquetFile("/tmp/testy.parquet").read(["events.muons.list.item.phi"]) >>> # reads all three pyarrow.Table events: struct>> child 0, muons: list> child 0, item: struct child 0, eta: double child 1, phi: double child 2, pt: double I hadn't realized that our attempts to read only "muon pt" or only "muon eta" were, in fact, reading all muon fields. (In the real datasets, muons have 32 fields, electrons have 47, taus have 37, jets have 30, photons have 27...) We could try to rearrange data to something shallower: {{>>> pq.write_table(pa.Table.from_pydict(\{"muons": [{"pt": 10.5, "eta": -1.5, "phi": 0.1}]}), "/tmp/testy.parquet")}} {{>>> pq.ParquetFile("/tmp/testy.parquet").read(["muons.pt"])}} {{pyarrow.Table}} {{muons: struct}} {{ child 0, pt: double}} {{>>> pq.ParquetFile("/tmp/testy.parquet").read(["muons.eta"])}} {{pyarrow.Table}} {{muons: struct}} {{ child 0, eta: double}} {{>>> pq.ParquetFile("/tmp/testy.parquet").read(["muons.phi"])}} {{pyarrow.Table}} {{muons: struct}} {{ child 0, phi: double}} but that puts a hard-to-predict constraint on data structures. In the above, aren't we "reading a single column of a struct column?" (I probably saw this behavior and assumed that it would continue to deeper structures, which is how I never noticed that they sometimes read all struct fields.) As a real-world case, here's a dataset that naturally has a structure that suffers from over-reading. It's not physics-related: it's a translation of the [Million Song Dataset|http://millionsongdataset.com/] into Parquet (side-note: it's losslessly 3× smaller than the original HDF5 files because of all the variable-length data): s3://pivarski-princeton/millionsongs/ . Lazily loading it has odd performance characteristics that I hadn't measured in detail until now: In [1]: import awkward as ak In [2]: songs = ak.from_parquet("/home/jpivarski/storage/data/million-song-datas ...: et/full/millionsongs/millionsongs-A-zstd.parquet", lazy=True) In [3]: %time songs.analysis.segments.loudness_start CPU times: user 19.1 ms, sys: 0 ns, total: 19.1 ms Wall time: 18.8 ms Out[3]: In [4]: %time songs.analysis.segments.loudness_max CPU
[jira] [Created] (ARROW-13185) [MATLAB] Consider alternatives to placing the MEX binaries within the source tree
Sarah Gilmore created ARROW-13185: - Summary: [MATLAB] Consider alternatives to placing the MEX binaries within the source tree Key: ARROW-13185 URL: https://issues.apache.org/jira/browse/ARROW-13185 Project: Apache Arrow Issue Type: Task Components: MATLAB Reporter: Sarah Gilmore Since modifying the source directory via the build process is generally considered non-optimal, we may want to explore alternative approaches. For example, during the build process, we could create a derived source tree (a copy of the original source tree) within the build area and place our build artifacts within the derived source tree. Then, we could add the derived source tree to the MATLAB search path. That's just one option, but there are others we could explore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-13125) [R] Throw error when 2+ args passed to desc() in arrange()
[ https://issues.apache.org/jira/browse/ARROW-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook resolved ARROW-13125. -- Resolution: Fixed Issue resolved by pull request 10559 [https://github.com/apache/arrow/pull/10559] > [R] Throw error when 2+ args passed to desc() in arrange() > -- > > Key: ARROW-13125 > URL: https://issues.apache.org/jira/browse/ARROW-13125 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 4.0.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Minor > Labels: pull-request-available > Fix For: 5.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently this does not result in an error, but it should: > {code:r}Table$create(x = 1:3, y = 4:6) %>% arrange(desc(x, y)){code} > The same problem affects dplyr on R data frames. I opened > https://github.com/tidyverse/dplyr/issues/5921 for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Deleted] (ARROW-13175) Technology Trends That Will f9zone Dominate 2017
[ https://issues.apache.org/jira/browse/ARROW-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13175: --- > Technology Trends That Will f9zone Dominate 2017 > > > Key: ARROW-13175 > URL: https://issues.apache.org/jira/browse/ARROW-13175 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > Technology has remarkably changed the way we live today, there is no denial > to it. Compared with our ancestors, we stand far away from them in using > different technologies for our day-to-day works. > So many technologies are developed in the past couple of years that have > revolutionized our lives, and it's impossible to list each of them. Though > technology changes fast with time, we can observe the trends in which it > changes. Last year, 2016 had bought so many fresh innovative ideas and > creations towards automation and integration etc., and this year 2017 is > expected to continue the similar kind of trend. > In this article, we are going to discuss some of the notable trends for this > year, which will make us look beyond the horizon. > Gartner's 2016 Hype Cycle for emerging technologies have identified different > technologies that will be trending this year. The cycle illustrates the fact > how technology innovations are redefining the relations between the customer > and marketer. > This year, Gartner has identified Blockchains, Connected Homes, Cognitive > Expert Advisors, Machine Learning, Software-defined Security etc. as the > overarching technology trends, which have the potential of reshaping the > business models and offering enterprises the definite route to emerging > markets and ecosystems. > #1. Blockchain > Popularly known as 'Distributed Ledger Technology' for both financial and > non-financial transactions, is one of the mystifying concepts that > technologists could only understand to the fullest. Various advancements in > blockchain have helped many people and more businesses in 2016, to experience > its potential in banking and finance industry. This year, it is anticipated > that blockchain technology would go beyond just banking sector, helping the > start-ups and established businesses to address the market needs with > different application offerings. > #2. Internet of Things & Smart Home Tech > With the advent of IoT, we are already eyeing the world of inter-connected > things, aren't we? Our dreams of living in smart homes are met to a certain > extent in 2016. So, what is stopping us from fulfilling our dreams of living > in smart connected homes? > Well, the fact is that the market is full of abundant individual appliances > and apps, but only a little amount of solutions integrate them into a single, > inclusive user experience. It is anticipated that 2017 will notice this trend > to undergo a big step towards fulfilling our dreams. > #3. Artificial Intelligence & Machine Learning > In the recent times, Artificial Intelligence and Machine Learning have taken > the entire world by storm with its amazing inventions and innovative > technologies. By observing the on-going advancements in this field, it will > be no longer an imagination to experience the world where robots and machine > will dominate the society. > Last year, we have witnessed the rise of ML algorithms on almost all major > e-commerce portals and its associated mobile apps, which is further expected > to spread across on all social networking platforms, dating websites, and > matrimonial websites in 2017. > #4. Software-defined Security > In 2016, we have observed a significant growth for increased server security. > Many organizations have started recognizing the significance of cybersecurity > to enable their move of emerging as digital businesses. The growth of > cloud-based infrastructure is causing a great demand for managing > unstructured data, and moreover, the lack of technical expertise and threat > to data security, are the key factors hindering the substantial growth of > software-defined security market this year. > #5. Automation > Automation will be the mainstay throughout 2017, the coming years will be > transformative for IT industry, enabling the automation of human performed > tasks. When Machine Learning is combined with automation, the marketers are > likely to witness wide business opportunities with enriched market results. > #6. Augmented Reality (AR) & Virtual Reality (VR) > AR and VR transform the way users interact with each other and software > systems. The year 2016 has experienced path-breaking steps in AR and VR > technology. > With the launch of Oculus Rift, the market had received an overwhelming > response from the users, making way to a plethora of VR-based apps and games. >
[jira] [Deleted] (ARROW-13179) Impact of Smart Technology on Data friv4school
[ https://issues.apache.org/jira/browse/ARROW-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13179: --- > Impact of Smart Technology on Data friv4school > -- > > Key: ARROW-13179 > URL: https://issues.apache.org/jira/browse/ARROW-13179 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > With evolving smart technologies, the entire process of rendering data entry > services has become way easier. Smart technologies are now helping businesses > strategically and economically by generating data from every possible source > including mobile phones, industrial equipment, smart accessories and personal > computers. > Data entry services are considered to be "smart" on their responsiveness with > respect to the incoming data. Businesses are looking for effective ways to > manage data for obtaining better value and supporting their ultimate > objectives. > Smart technologies tend to engage people and various smart devices with the > related business, for better processing and collection of data from > designated sources. For supporting and coping with the current evolution of > such technologies, processes are being constantly renewed. > There are various smart applications that enhance data analytics processes > and make them even better. These include Cloud Computing, Internet of Things, > Smart Data and Machine Learning. > Need of Smart Technology > Data entry services, when offered with smart technologies provide real-time > data processing, thus improving business's economic growth and providing a > business-friendly option with efficient data management. > When looking for a suitable smart app for Nowadays, businesses are striving > for more innovative strategies while incorporating these smart apps. > It eradicates the need of paper documents. > It provides innovation with a customer-centered approach. > These technologies are all industry-oriented, providing accurate results > These are scalable and easy-to-adopt. > They work even better with unorganized data volumes. > Collection of Data via Smart Technologies > Smart technologies assist in collecting and assembling data through: > Intelligent Capture replacing template-based data extraction with an > efficient capturing module and natural language understanding. > Mobile Data Entry for collecting data on various mobile devices, enabling > smart data entry services. > Robotic Process Automation (RPA) providing the latest smart recognition > technology for improved data processing. > Data Alteration through Smart Technologies > For better use of these technologies, data entry services and methodologies > are continuously being reshaped and revised, allowing organizations to take > competitive advantage, along with enhancing cost-efficiency and security of > business operations. > Smart technologies include Artificial Intelligence, Machine Learning, > Internet of Things have now replaced manual processes that are more > time-consuming, providing lesser room for human errors. > Let's talk about a few of these technologies: > Artificial Intelligence and Machine Learning are more responsive and secure > when it comes to managing any repetitive task, recognizing various patterns > and enhancing the accuracy level. > For expanding number of data sources and creating a connection between > people, internet, devices and businesses, IOT (Internet of Things) is used > extensively these days. > From cloud computing services based on data entry services, businesses can > derive benefit and manage the complexity of their data infrastructure. > Effect of Intelligent Technologies > Smart technologies are drastically casting a positive impact over data entry > services and rendering a friendlier approach, providing benefits in the > following ways: > Better and more composed process, leading to reduction of human errors. > It has become faster and more efficient with easy management of data in bulk > and from different sources like paper forms, scanned images and much more. > Streamlining the business operations > [*friv4school*|https://complextime.com/friv-everything-you-need-to-know-about-it/] > and changing the perception of businesses to deal with data management > projects. > Increasing the potential to scale data entry processes and utilize innovative > techniques. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Deleted] (ARROW-13180) Teaching With f95 Technology
[ https://issues.apache.org/jira/browse/ARROW-13180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13180: --- > Teaching With f95 Technology > > > Key: ARROW-13180 > URL: https://issues.apache.org/jira/browse/ARROW-13180 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > Teaching with technology helps to expand student learning by assistant > instructional objectives. However, it can be thought-provoking to select the > best technology tools while not losing sight of the goal for student > learning. An expert can find creative and constructive ways to integrate > technology into our class. > What do we mean by technology? > The term technology refers to the development of the techniques and tools we > use to solve problems or achieve goals. Technology can encompass all kinds of > tools from low-tech pencils, paper, a chalkboard to the use of presentation > software, or high-tech tablets, online collaboration and conference tools and > more. the newest technologies allow us to try things in physical and virtual > classrooms that were not possible before. > How can technology help students? > Technology can help a student through the following ways: > 1. Online collaboration tools: Technology has helped the students & > instructors to share document online, editing of the document in real time > and project them on a screen. This gives the students a collaborative > platform in which to brainstorm ideas and document their work using text and > pictures. > 2. Presentation software: This enables the instructor to embed > high-resolution photographs, diagrams, videos and sound files to augment the > text and verbal lecture content. > 3. Tablet: Here, tablets can be linked to computers, projectors, and cloud so > that students and instructors can communicate through text, drawings, and > diagrams. > 4. Course management tools: This allows instructors to organize all the > resources students' needs for the class. the syllabus, assignments, readings, > online quizzes. > 5. Smartphone: These are a quick and easy way to survey students during > class. It is a great instant polling which can quickly access students > understanding and help instructors to adjust pace and content > 6. Lecture capture tools: The lecture capture tools allow instructors to > record lectures directly from their computer without elaborate or additional > classroom equipment. The record lectures at their own pace. > Advantages of technology integration in the education sphere? > The teaching strategies based on educational technology can be described as > ethical that facilitates the students learning and boost their capacity, > productivity, and performance. technology integration inspires positive > changes in teaching methods on an international level. The following list of > benefit will help in resolving a final conclusion: > 1. Technology makes teaching easy: technology has power. It helps in the use > of projectors and computer presentations to deliver > *[f95|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* any type > of lesson or instruction and improve the level of comprehension within the > class rather than giving theoretical explanations that students cannot > understand. > 2. It facilitates student progress: technology has made teachers rely on > platforms and tools that enable you to keep track of individual achievements. > 3. Education technology is good for the environment: if all schools have > dedicated to being using digital textbooks, can you imagine the amount of > paper and number of trees that will be saved. students can be instructed to > take an online test and submit their papers and homework through email. They > can be also encouraged to use readers to read through the literature assigned. > 4. It has made students enjoy learning: students enjoy learning through their > addiction to Facebook, Instagram, dig, and other websites from a very early > age. the internet can distract them from the learning process. making > learning enjoyable through setting up a private Facebook group for the class > and inspire constructive conversations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Deleted] (ARROW-13176) T Is for Technology in Triathlon Training
[ https://issues.apache.org/jira/browse/ARROW-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13176: --- > T Is for Technology in Triathlon Training > - > > Key: ARROW-13176 > URL: https://issues.apache.org/jira/browse/ARROW-13176 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > The original triathletes were amazing. Dave Scott and Mark Allen accomplished > amazing feats in triathlon long before technology took over the sport. They > didn't have metrics like we have today and they certainly didn't have all of > the information gathering abilities we have. Yet, they set records and > competed valiantly. In fact Mark Allen still holds the marathon record in > Kona to this day. Technology is a great friend to triathletes but is does > have a downside. > TECHNOLOGY ITEMS > So technology has taken over every part of triathlon. One of the most widely > researched areas is the area of the triathlon watch. Each and every year > there are new watches available for purchase that have ever increasing > measurements for the triathlete. My personal favorite is the Garmin 910XT. > This watch gives me heart rate, power (with a power meter), pacing (with > optional foot pod), speed, cadence (with optional cadence sensor), mileage, > yards in swimming, and much more. Each of these measurements aid me in > measuring my success or failures in each and every training session and race. > Technology has been making huge strides in bicycles and wheel sets. The > amount of research going into these two items within the world of triathlon > is incredible. Each and every year there are new and exciting advances in > aerodynamic speed in bicycles and wheel sets. Much of the time these > technologies can take on two very different vantage points. This was most > evident at the 2016 World Championships in Kona. Diamond Bikes unveiled their > Andean bike which fills in all the space in between the front tire and the > back tire with a solid piece to make the wind pass by this area for > aerodynamics. Another bike debuted at Kona this year with the exact opposite > idea. The Ventum bike eliminated the down tube of the bike and made a vacant > space in between the front tire and the back tire with only the top tube > remaining. These are two very different ideas about aerodynamics. This is one > of the amazing things about the advancement of technology and one of the > downsides as well. > Each and every piece of equipment in triathlon is undergoing constant > technology advancements. Shoes, wetsuits, socks, nutrition, hats, sunglasses, > helmets, racing kits, and anything else you can imagine. This world of > technology in triathlon is not near to completion and will continue to push > the limits. > THE UPSIDE TO TECHNOLOGY > Technology in triathlon is amazing. These new items are exciting and make > each and every year different. There are new advancements that help > triathletes go faster and longer. These new technologies help even the > amateur triathlete to go faster. Just the purchase of new wheels can mean the > difference between being on or off the podium. The advancement of shoes has > aided many athletes to avoid the injuries that plague so many such as plantar > fasciitis. Technology will continue to aid the sport in becoming better and > better. > THE DOWNSIDE TO TECHNOLOGY > The downside to technology is that the amateur triathlete arrives at their > local race already incapable of winning because someone else has the money to > buy some of the latest technology. The biggest purchases such as wheel sets > and bicycles can be cost prohibitive to the average triathlete and yet > *[friv.com|https://complextime.com/friv-everything-you-need-to-know-about-it/]* > there are individuals who purchase these items at alarming rates. The > amateur triathlete can also feel overwhelmed at what to purchase and what not > to purchase. Some items of technology are not worth the extra cost because > they do not decrease racing time significantly enough for what they cost. Now > that these new technologies have been out awhile, knock-offs have begun to > make lower cost items. It will be interesting to watch the flood of these > knock-offs into the market and see how that affects the big boys of > technology. > If you are an amateur triathlete shop smart and don't go buy the new gadgets > just because they are new. Make sure to invest in items that are going to > truly make you faster and not just a gimmick. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Deleted] (ARROW-13183) How Has Technology Changed f95z Our Lives?
[ https://issues.apache.org/jira/browse/ARROW-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13183: --- > How Has Technology Changed f95z Our Lives? > -- > > Key: ARROW-13183 > URL: https://issues.apache.org/jira/browse/ARROW-13183 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > In the midst of the darkness that engulfed the world, the technology changed > the entire life of the human beings. Undoubtedly, we have some negative > repercussions of the technology but the positive results of technology have > more weight than that of negative. However, it seems a little bit difficult > for us to believe that technology has changed our life because it has taken > its place slowly and gradually. Therefore, there are innumerable > justifications which have been spotlighted below which can prove us that how > technology has changed our life in-toto. > Education System > Education is a broad field but if we take only a single aspect that is the > way of learning then we can come across with great difference that how it has > changed our life. For instance, when we were young, it was so hard for us to > have a good education along with the variety of examples, and we used to go > to buy different expensive books just for the sake of limited topics for > making notes and can have good marks in our exams. However, in this > technological world, it has become very easy to access different topics on > the world of the internet in the very short span of time which also can also > be shared with the friends on social media > Business System > In the ancient time, it was too difficult to give advertisement of newly > launched business with outdated sources such as pasting posters on the wall, > distributing the pamphlet to people in a busy market, etc. However, in this > contemporary world, technology has made very easy for sharing advertisement > of our business at different areas such as on internet sites, on social > media, on big LCD's at busy roads, etc. So, this is how our life has changed > due to technical assistance and we can easily promote our business in no time. > Medical Department > Besides the field of business, Medical Department is at its peak just because > of technology. In early life, it was the only Malaria, a fatal disease, > because of that many people lost their lives, but now this Malaria which is > caused by Plasmodium can easily be treated without any risk. Similarly, this > medical *[f95z|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* > science is working efficiently and it has diagnosed innumerable ways to live > a secure life than earlier. Therefore, technology is the only liable course > which has changed our life. > Communication System > Last but not least, the communication system has completely changed our life > in this technological world and has made a world as a global village. > Formerly, people used to send their message through the help of pigeons, then > postman but now it has become very easy not to just send the message but also > can have access to video call to the one you want to send the message. This > is the internet along with smartphones which have made easier for every > individual to connect himself with all his distant relatives around the > world. Thus, it is the only technology which has made our lives easier than > before. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Deleted] (ARROW-13182) Innovative Ideas in the Field of f95 zone Technology
[ https://issues.apache.org/jira/browse/ARROW-13182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13182: --- > Innovative Ideas in the Field of f95 zone Technology > > > Key: ARROW-13182 > URL: https://issues.apache.org/jira/browse/ARROW-13182 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > Innovative ideas in the field of technology have simplified the work and > helped our rapid development. These ideas contribute to the creation of > innovative technologies over time. In order to create this innovative idea, > it is necessary to have the knowledge, which is fundamental in this process. > Thus we get the scheme: knowledge, idea, technology. > To date, innovative technologies are traditionally divided into two segments: > information technologies (technologies of automated information processing) > and communication technologies (technologies for storage and transmission of > information). For example, with the help of communication technologies, > people can receive and transmit various contents, being in different corners > of our world. International relations, including education, business > negotiations and much more are now possible faster and more efficiently. If > we recall the communication innovations in the field of education, first of > all, it should be emphasized that people can enter higher education > institutions and study remotely regardless of their location. Furthermore, > every qualified pedagogue teaches something new and useful. Communication > with representatives of other countries contributes to our self-development. > All this eventually promotes the creation of qualified unique staff. > Information technologies allow: > - To automate certain labour-intensive operations; > - Automate and optimize production planning; > - Optimize individual business processes (for example, customer relations, > asset management, document management, management decision-making), taking > into account the specifics of various branches of economic activity. > Information technology is used for large data processing systems, computing > on a personal computer, in science and education, in management, > computer-aided design and the creation of systems with artificial > intelligence. Information technologies are the modern technological systems > of immense strategic importance (political, defence, economic, social and > cultural), which led to the formation of a new concept of [*f95 > zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the world > order - "who owns the information, he owns the world." > The spread of information and communication technologies play an important > role in structural changes in all the areas of our life. For someone, it will > be difficult to learn these technologies. Workers who will not be able to > study will have to give way to the younger generation. Thus we are faced with > a problem because, in order to use innovations in technologies and develop > it, it is necessary to have a qualified youth. First and foremost there is > the question of education. Anyway, only education can create a developed > generation that will continue to strive for new knowledge and will meet the > requirements of innovative technologies. In addition, I am convinced that > innovative ideas in technologies have created a completely new life, which > poses new challenges for our country. How we will cope with these tasks > depends on the future of our country. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Deleted] (ARROW-13177) Technology Acceptance juego friv Model
[ https://issues.apache.org/jira/browse/ARROW-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13177: --- > Technology Acceptance juego friv Model > -- > > Key: ARROW-13177 > URL: https://issues.apache.org/jira/browse/ARROW-13177 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > Advances in computing and information technology are changing the way people > meet and communicate. People can meet, talk, and work together outside > traditional meeting and office spaces. For instance, with the introduction of > software designed to help people schedule meetings and facilitate decision or > learning processes, is weakening geographical constraints and changing > interpersonal communication dynamics. Information technology is also > dramatically affecting the way people teach and learn. > As new information technologies infiltrate workplaces, home, and classrooms, > research on user acceptance of new technologies has started to receive much > attention from professionals as well as academic researchers. Developers and > software industries are beginning to realize that lack of user acceptance of > technology can lead to loss of money and resources. > In studying user acceptance and use of technology, the TAM is one of the most > cited models. The Technology Acceptance Model (TAM) was developed by Davis to > explain computer-usage behavior. The theoretical basis of the model was > Fishbein and Ajzen's Theory of Reasoned Action (TRA). > The Technology Acceptance Model (TAM) is an information systems (System > consisting of the network of all communication channels used within an > organization) theory that models how users come to accept and use a > technology, The model suggests that when users are presented with a new > software package, a number of factors influence their decision about how and > when they will use it, notably: > Perceived usefulness (PU) - This was defined by Fred Davis as "the degree to > which a person believes that using a particular system would enhance his or > her job performance". > Perceived ease-of-use (PEOU) Davis defined this as "the degree to which a > person believes that using a particular system would be free from effort" > (Davis, 1989). > The goal of TAM is "to provide an explanation of the determinants of computer > acceptance that is general, capable of explaining user behavior across a > broad range of end-user computing technologies and user populations, while at > the same time being both parsimonious and theoretically justified". > According to the TAM, if a user perceives a specific technology as useful, > she/he will believe in a positive use-performance relationship. Since effort > is a finite resource, a user is likely to accept an application when she/he > perceives it as easier to use than another .As a consequence, educational > technology with a high level of PU and PEOU is more likely to induce positive > perceptions. The relation between PU and PEOU is that PU mediates the effect > of PEOU on attitude and intended use. In other words, while PU has direct > impacts on attitude and use, PEOU influences attitude and use indirectly > through PU. > User acceptance is defined as "the demonstrable willingness within a user > group to employ information technology for the tasks it is designed to > support" (Dillon & Morris). Although this definition focuses on planned and > intended uses of technology, studies report that individual perceptions of > information technologies are likely to be influenced by the objective > characteristics of technology, as well as interaction with other users. For > example, the extent to which one evaluates new technology as useful, she/he > is likely to use it. At the same time, her/his perception of the system is > influenced by the way people around her/him evaluate and use the system. > Studies on information technology continuously report that user attitudes are > important factors affecting the success of the system. For the past several > decades, many definitions of attitude have been proposed. However, all > theories consider attitude to be a relationship between a person and an > [*juego > friv*|https://complextime.com/friv-everything-you-need-to-know-about-it/] ** > object (Woelfel, 1995). > In the context of information technologies, is an approach to the study of > attitude - the technology acceptance model (TAM). TAM suggests users > formulate a positive attitude toward the technology when they perceive the > technology to be useful and easy to use (Davis, 1989). > A review of scholarly research on IS acceptance and usage suggests that TAM > has emerged as one of the most influential models in this stream of research > The TAM
[jira] [Deleted] (ARROW-13181) Big Data and Technology Services Market f 95 zone
[ https://issues.apache.org/jira/browse/ARROW-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13181: --- > Big Data and Technology Services Market f 95 zone > - > > Key: ARROW-13181 > URL: https://issues.apache.org/jira/browse/ARROW-13181 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > Big data has been touted as the next massive transformation in global data > analysis and management. Businesses around the globe have incorporated big > data in their operations to make sense of the seeming myriad data generated > on a consistent basis. The adoption of big data technology and services has > grown at a robust pace among end-use industries. As big data becomes more > mainstream, and integration with cloud and artificial intelligences becomes > more streamlined, further growth is projected. According to a recently > published report, the global big data technology and services market is > poised to reach a valuation of over US$ 184 Bn. > Data-Driven Decision Making Continues to Fuel Adoption of Big Data Technology > and Services > Over the years, there has been significant shift in how businesses make > critical business decisions. Assumptions and traditional intelligence > gathering has given way to fact-based, data-driven decision making which has > furthered the cause for adopting big data solutions. The change in status-quo > has been one of the key factors for the growing adoption of big data > technology and services in various end-use industries. As more businesses are > realizing the advantages of big data in decision-making, it is highly likely > that adoption of big data technology and services will grow at a steady pace > in the short- and long-term. > The information big data analysis brings to the fore has also helped > businesses bridge the challenges associated with agility and stakeholder > empowerment. Businesses have traditionally faced an uphill task in terms of > finding that elusive balance between agility and decentralization. Counting > in everyone's opinion before making big decisions has been the utopian focus > of businesses, however, it also comes with the risk of slowing down the > decision-making process in a hyper-competitive environment. The RACI > framework, which has been referred by businesses to reduce ambiguity on > choosing the right authority on decision-making is becoming easier to > navigate as access to data makes the entire decision-making process a > seamless affair. > Integration of Big Data with Traditional Business Intelligence - The Way > Forward? > Integration of big data technology and services with traditional business > intelligence is being looked upon as the way forward for businesses focusing > on quick fact-based decision making and improvement in customer experience. > Business intelligence has been a reliable tool for companies to understand > their target audience more intimately; however, the high turnaround time has > remained an impediment. The incorporation of big data has mitigated this > challenge to an extent, which in turn has fuelled adoption among end-users. > In the future, it is highly likely that big data and business intelligence > will become highly intertwined. > Banking, Financial services and Insurance (BFSI) Industry Continues to be at > Forefront of Adoption > Although adoption of big data technology and service has been pervasive, BFSI > sector has remained at the forefront of adoption since the early days of big > data. The sheer volume of data generated on a daily basis in the BFSI > industry has necessitated the adoption of a holistic data monitoring, > gathering, and analysis solutions. Some of the key challenges that the BFSI > sector currently faces include fraud identification, unorganized data, and > operational inefficiency. The inclusion of big data technology and services > has helped alleviate some of these challenges to a great extent. On the back > of these improvements, there has been a significant penetration of big data > in the BFSI sector. According to current estimates, revenues generated from > adoption of big data technology and services are likely to reach over US$ 33 > billion in terms of revenues by 2026. > Inclusion of Big Data Technology and Services Gaining Ground in Healthcare > Sector > Big data has massive potential in the healthcare industry, with proponents > touting benefits ranging from epidemic prediction and reduced cost of > treatments. Although electronic health records (EHR) have been a staple in > the healthcare sector for quite a while, their efficacy is limited to the > medical history of patients. Big data, on the other hand, promises a > comprehensive, holistic data analysis
[jira] [Deleted] (ARROW-13178) Disruptive Technologies firv
[ https://issues.apache.org/jira/browse/ARROW-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13178: --- > Disruptive Technologies firv > > > Key: ARROW-13178 > URL: https://issues.apache.org/jira/browse/ARROW-13178 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > I am not into technologies, those that change so ever fast, and always. But I > do observe technological trends, along which the development of scientific > applications revolves. > And of all trends, perhaps disruptive technologies are the defining path of > industrial implications, a linear passage that technological progress almost > invariably follows. Though the concept of "disruptive technologies" is only > popularized in 1997 by Harvard Business School Professor Clayton Christensen > in his best-seller "The Innovator's Dilemma", the phenomenon was already > evidenced back in 1663, when Edward Somerset published designs for, and might > have installed, a steam engine. > As put forth by Clayton Christensen, disruptive technologies are initially > low performers of poor profit margins, targeting only a minute sector of the > market. However, they often develop faster than industry incumbents and > eventually outpace the giants to capture significant market shares as their > technologies, cheaper and more efficient, could better meet prevailing > consumers' demands. > In this case, the steam engines effectively displaced horse power. The demand > for steam engines was not initially high, due to the then unfamiliarity to > the invention, and the ease of usage and availability of horses. However, as > soon as economic activities intensified, and societies prospered, a niche > market for steam engines quickly developed as people wanted modernity and > faster transportation. > One epitome of modern disruptive technologies is Napster, a free and easy > music sharing program that allows users to distribute any piece of recording > online. The disruptee here is conventional music producers. Napster > relevantly identified the "non-market", the few who wanted to share their own > music recordings for little commercial purpose, and thus provided them with > what they most wanted. Napster soon blossomed and even transformed the way > the internet was utilized. > Nevertheless, there are more concerns in the attempt to define disruptive > technologies than simply the definition itself. > One most commonly mistaken feature for disruptive technologies is sustaining > technologies. While the former brings new technological innovation, the > latter refers to "successive incremental improvements to performance" > incorporated into existing products of market incumbents. Sustaining > technologies could be radical, too; the new improvements could herald the > demise of current states of production, like how music editor softwares > convenience Napster users in music customization and sharing, thereby > trumping over traditional whole-file transfers. The music editors are part of > a sustaining technological to Napster, not a new disruptor. Thus, disruptive > and sustaining technologies could thrive together, until the next wave of > disruption comes. > See how music editors are linked to steam engines? Not too close, but each > represents one aspect of the twin engines that drive progressive > technologies; disruptors breed sustainers, and sustainers feed disruptors. > This character of sustaining technologies brings us to another perspective of > disruptive technologies: they not only change the way people do business, but > also initiate a fresh wave of follow-up technologies that propel the > disruptive technology to success. Sometimes, sustaining technologies manage > to carve out a niche market for its own even when the disruptive initiator > has already shut down. Music editor and maker softwares continue to healthily > thrive, despite Napster's breakdown (though many other file sharing services > are functioning by that time), with products like the AV Music Morpher Gold > and Sound Forge 8. > A disruptive technology is also different from a paradigm shift, which Thomas > Kuhn used to describe "the process and result of a change in basic > assumptions within the ruling theory of science". In disruptive technologies, > there are no assumptions, but only the rules of game of which the change is > brought about by the behaviors of market incumbents and new entrants. They > augment different markets that eventually merge. In Clayton Christensen's > words, newcomers to the industry almost invariably "crush the incumbents". > While researching on disruptive > [*firv*|https://complextime.com/friv-everything-you-need-to-know-about-it/] > technologies, I came across this
[jira] [Deleted] (ARROW-13184) Technology Trends That Will f9zone Dominate 2017
[ https://issues.apache.org/jira/browse/ARROW-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane deleted ARROW-13184: --- > Technology Trends That Will f9zone Dominate 2017 > > > Key: ARROW-13184 > URL: https://issues.apache.org/jira/browse/ARROW-13184 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > Technology has remarkably changed the way we live today, there is no denial > to it. Compared with our ancestors, we stand far away from them in using > different technologies for our day-to-day works. > So many technologies are developed in the past couple of years that have > revolutionized our lives, and it's impossible to list each of them. Though > technology changes fast with time, we can observe the trends in which it > changes. Last year, 2016 had bought so many fresh innovative ideas and > creations towards automation and integration etc., and this year 2017 is > expected to continue the similar kind of trend. > In this article, we are going to discuss some of the notable trends for this > year, which will make us look beyond the horizon. > Gartner's 2016 Hype Cycle for emerging technologies have identified different > technologies that will be trending this year. The cycle illustrates the fact > how technology innovations are redefining the relations between the customer > and marketer. > This year, Gartner has identified Blockchains, Connected Homes, Cognitive > Expert Advisors, Machine Learning, Software-defined Security etc. as the > overarching technology trends, which have the potential of reshaping the > business models and offering enterprises the definite route to emerging > markets and ecosystems. > #1. Blockchain > Popularly known as 'Distributed Ledger Technology' for both financial and > non-financial transactions, is one of the mystifying concepts that > technologists could only understand to the fullest. Various advancements in > blockchain have helped many people and more businesses in 2016, to experience > its potential in banking and finance industry. This year, it is anticipated > that blockchain technology would go beyond just banking sector, helping the > start-ups and established businesses to address the market needs with > different application offerings. > #2. Internet of Things & Smart Home Tech > With the advent of IoT, we are already eyeing the world of inter-connected > things, aren't we? Our dreams of living in smart homes are met to a certain > extent in 2016. So, what is stopping us from fulfilling our dreams of living > in smart connected homes? > Well, the fact is that the market is full of abundant individual appliances > and apps, but only a little amount of solutions integrate them into a single, > inclusive user experience. It is anticipated that 2017 will notice this trend > to undergo a big step towards fulfilling our dreams. > #3. Artificial Intelligence & Machine Learning > In the recent times, Artificial Intelligence and Machine Learning have taken > the entire world by storm with its amazing inventions and innovative > technologies. By observing the on-going advancements in this field, it will > be no longer an imagination to experience the world where robots and machine > will dominate the society. > Last year, we have witnessed the rise of ML algorithms on almost all major > e-commerce portals and its associated mobile apps, which is further expected > to spread across on all social networking platforms, dating websites, and > matrimonial websites in 2017. > #4. Software-defined Security > In 2016, we have observed a significant growth for increased server security. > Many organizations have started recognizing the significance of cybersecurity > to enable their move of emerging as digital businesses. The growth of > cloud-based infrastructure is causing a great demand for managing > unstructured data, and moreover, the lack of technical expertise and threat > to data security, are the key factors hindering the substantial growth of > software-defined security market this year. > #5. Automation > Automation will be the mainstay throughout 2017, the coming years will be > transformative for IT industry, enabling the automation of human performed > tasks. When Machine Learning is combined with automation, > [*f9zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the > marketers are likely to witness wide business opportunities with enriched > market results. > #6. Augmented Reality (AR) & Virtual Reality (VR) > AR and VR transform the way users interact with each other and software > systems. The year 2016 has experienced path-breaking steps in AR and VR > technology. > With the launch of Oculus Rift, the market had received an overwhelming > response
[jira] [Closed] (ARROW-13184) Technology Trends That Will f9zone Dominate 2017
[ https://issues.apache.org/jira/browse/ARROW-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Keane closed ARROW-13184. -- Resolution: Not A Bug > Technology Trends That Will f9zone Dominate 2017 > > > Key: ARROW-13184 > URL: https://issues.apache.org/jira/browse/ARROW-13184 > Project: Apache Arrow > Issue Type: Bug >Reporter: Abigail Cole >Priority: Major > > Technology has remarkably changed the way we live today, there is no denial > to it. Compared with our ancestors, we stand far away from them in using > different technologies for our day-to-day works. > So many technologies are developed in the past couple of years that have > revolutionized our lives, and it's impossible to list each of them. Though > technology changes fast with time, we can observe the trends in which it > changes. Last year, 2016 had bought so many fresh innovative ideas and > creations towards automation and integration etc., and this year 2017 is > expected to continue the similar kind of trend. > In this article, we are going to discuss some of the notable trends for this > year, which will make us look beyond the horizon. > Gartner's 2016 Hype Cycle for emerging technologies have identified different > technologies that will be trending this year. The cycle illustrates the fact > how technology innovations are redefining the relations between the customer > and marketer. > This year, Gartner has identified Blockchains, Connected Homes, Cognitive > Expert Advisors, Machine Learning, Software-defined Security etc. as the > overarching technology trends, which have the potential of reshaping the > business models and offering enterprises the definite route to emerging > markets and ecosystems. > #1. Blockchain > Popularly known as 'Distributed Ledger Technology' for both financial and > non-financial transactions, is one of the mystifying concepts that > technologists could only understand to the fullest. Various advancements in > blockchain have helped many people and more businesses in 2016, to experience > its potential in banking and finance industry. This year, it is anticipated > that blockchain technology would go beyond just banking sector, helping the > start-ups and established businesses to address the market needs with > different application offerings. > #2. Internet of Things & Smart Home Tech > With the advent of IoT, we are already eyeing the world of inter-connected > things, aren't we? Our dreams of living in smart homes are met to a certain > extent in 2016. So, what is stopping us from fulfilling our dreams of living > in smart connected homes? > Well, the fact is that the market is full of abundant individual appliances > and apps, but only a little amount of solutions integrate them into a single, > inclusive user experience. It is anticipated that 2017 will notice this trend > to undergo a big step towards fulfilling our dreams. > #3. Artificial Intelligence & Machine Learning > In the recent times, Artificial Intelligence and Machine Learning have taken > the entire world by storm with its amazing inventions and innovative > technologies. By observing the on-going advancements in this field, it will > be no longer an imagination to experience the world where robots and machine > will dominate the society. > Last year, we have witnessed the rise of ML algorithms on almost all major > e-commerce portals and its associated mobile apps, which is further expected > to spread across on all social networking platforms, dating websites, and > matrimonial websites in 2017. > #4. Software-defined Security > In 2016, we have observed a significant growth for increased server security. > Many organizations have started recognizing the significance of cybersecurity > to enable their move of emerging as digital businesses. The growth of > cloud-based infrastructure is causing a great demand for managing > unstructured data, and moreover, the lack of technical expertise and threat > to data security, are the key factors hindering the substantial growth of > software-defined security market this year. > #5. Automation > Automation will be the mainstay throughout 2017, the coming years will be > transformative for IT industry, enabling the automation of human performed > tasks. When Machine Learning is combined with automation, > [*f9zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the > marketers are likely to witness wide business opportunities with enriched > market results. > #6. Augmented Reality (AR) & Virtual Reality (VR) > AR and VR transform the way users interact with each other and software > systems. The year 2016 has experienced path-breaking steps in AR and VR > technology. > With the launch of Oculus Rift, the market had received an
[jira] [Updated] (ARROW-13149) [R] Convert named lists to structs instead of (unnamed) lists
[ https://issues.apache.org/jira/browse/ARROW-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13149: --- Labels: pull-request-available (was: ) > [R] Convert named lists to structs instead of (unnamed) lists > - > > Key: ARROW-13149 > URL: https://issues.apache.org/jira/browse/ARROW-13149 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Jonathan Keane >Assignee: Jonathan Keane >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13172) [Java] Make TYPE_WIDTH in Vector public
[ https://issues.apache.org/jira/browse/ARROW-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Li updated ARROW-13172: - Summary: [Java] Make TYPE_WIDTH in Vector public (was: Make TYPE_WIDTH in Vector public) > [Java] Make TYPE_WIDTH in Vector public > --- > > Key: ARROW-13172 > URL: https://issues.apache.org/jira/browse/ARROW-13172 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Eduard Tudenhoefner >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Some Vector classes already expose the TYPE_WIDTH publicly. It would be > helpful if all Vectors would do that -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13174) [C++][Compute] Add strftime kernel
[ https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Li updated ARROW-13174: - Summary: [C++][Compute] Add strftime kernel (was: [C+][Compute] Add strftime kernel) > [C++][Compute] Add strftime kernel > -- > > Key: ARROW-13174 > URL: https://issues.apache.org/jira/browse/ARROW-13174 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > To convert timestamps to a string representation with an arbitrary format we > require a strftime kernel (the inverse operation of the {{strptime}} kernel > we already have). > See [comments > here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13184) Technology Trends That Will f9zone Dominate 2017
Abigail Cole created ARROW-13184: Summary: Technology Trends That Will f9zone Dominate 2017 Key: ARROW-13184 URL: https://issues.apache.org/jira/browse/ARROW-13184 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole Technology has remarkably changed the way we live today, there is no denial to it. Compared with our ancestors, we stand far away from them in using different technologies for our day-to-day works. So many technologies are developed in the past couple of years that have revolutionized our lives, and it's impossible to list each of them. Though technology changes fast with time, we can observe the trends in which it changes. Last year, 2016 had bought so many fresh innovative ideas and creations towards automation and integration etc., and this year 2017 is expected to continue the similar kind of trend. In this article, we are going to discuss some of the notable trends for this year, which will make us look beyond the horizon. Gartner's 2016 Hype Cycle for emerging technologies have identified different technologies that will be trending this year. The cycle illustrates the fact how technology innovations are redefining the relations between the customer and marketer. This year, Gartner has identified Blockchains, Connected Homes, Cognitive Expert Advisors, Machine Learning, Software-defined Security etc. as the overarching technology trends, which have the potential of reshaping the business models and offering enterprises the definite route to emerging markets and ecosystems. #1. Blockchain Popularly known as 'Distributed Ledger Technology' for both financial and non-financial transactions, is one of the mystifying concepts that technologists could only understand to the fullest. Various advancements in blockchain have helped many people and more businesses in 2016, to experience its potential in banking and finance industry. This year, it is anticipated that blockchain technology would go beyond just banking sector, helping the start-ups and established businesses to address the market needs with different application offerings. #2. Internet of Things & Smart Home Tech With the advent of IoT, we are already eyeing the world of inter-connected things, aren't we? Our dreams of living in smart homes are met to a certain extent in 2016. So, what is stopping us from fulfilling our dreams of living in smart connected homes? Well, the fact is that the market is full of abundant individual appliances and apps, but only a little amount of solutions integrate them into a single, inclusive user experience. It is anticipated that 2017 will notice this trend to undergo a big step towards fulfilling our dreams. #3. Artificial Intelligence & Machine Learning In the recent times, Artificial Intelligence and Machine Learning have taken the entire world by storm with its amazing inventions and innovative technologies. By observing the on-going advancements in this field, it will be no longer an imagination to experience the world where robots and machine will dominate the society. Last year, we have witnessed the rise of ML algorithms on almost all major e-commerce portals and its associated mobile apps, which is further expected to spread across on all social networking platforms, dating websites, and matrimonial websites in 2017. #4. Software-defined Security In 2016, we have observed a significant growth for increased server security. Many organizations have started recognizing the significance of cybersecurity to enable their move of emerging as digital businesses. The growth of cloud-based infrastructure is causing a great demand for managing unstructured data, and moreover, the lack of technical expertise and threat to data security, are the key factors hindering the substantial growth of software-defined security market this year. #5. Automation Automation will be the mainstay throughout 2017, the coming years will be transformative for IT industry, enabling the automation of human performed tasks. When Machine Learning is combined with automation, [*f9zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the marketers are likely to witness wide business opportunities with enriched market results. #6. Augmented Reality (AR) & Virtual Reality (VR) AR and VR transform the way users interact with each other and software systems. The year 2016 has experienced path-breaking steps in AR and VR technology. With the launch of Oculus Rift, the market had received an overwhelming response from the users, making way to a plethora of VR-based apps and games. Further, when Pokémon Go was released, it has completely re-defined the definition of gaming experience. It was one of the most profitable and downloaded the mobile application of 2016. The response AR and VR technology has received last year was
[jira] [Created] (ARROW-13183) How Has Technology Changed f95z Our Lives?
Abigail Cole created ARROW-13183: Summary: How Has Technology Changed f95z Our Lives? Key: ARROW-13183 URL: https://issues.apache.org/jira/browse/ARROW-13183 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole In the midst of the darkness that engulfed the world, the technology changed the entire life of the human beings. Undoubtedly, we have some negative repercussions of the technology but the positive results of technology have more weight than that of negative. However, it seems a little bit difficult for us to believe that technology has changed our life because it has taken its place slowly and gradually. Therefore, there are innumerable justifications which have been spotlighted below which can prove us that how technology has changed our life in-toto. Education System Education is a broad field but if we take only a single aspect that is the way of learning then we can come across with great difference that how it has changed our life. For instance, when we were young, it was so hard for us to have a good education along with the variety of examples, and we used to go to buy different expensive books just for the sake of limited topics for making notes and can have good marks in our exams. However, in this technological world, it has become very easy to access different topics on the world of the internet in the very short span of time which also can also be shared with the friends on social media Business System In the ancient time, it was too difficult to give advertisement of newly launched business with outdated sources such as pasting posters on the wall, distributing the pamphlet to people in a busy market, etc. However, in this contemporary world, technology has made very easy for sharing advertisement of our business at different areas such as on internet sites, on social media, on big LCD's at busy roads, etc. So, this is how our life has changed due to technical assistance and we can easily promote our business in no time. Medical Department Besides the field of business, Medical Department is at its peak just because of technology. In early life, it was the only Malaria, a fatal disease, because of that many people lost their lives, but now this Malaria which is caused by Plasmodium can easily be treated without any risk. Similarly, this medical *[f95z|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* science is working efficiently and it has diagnosed innumerable ways to live a secure life than earlier. Therefore, technology is the only liable course which has changed our life. Communication System Last but not least, the communication system has completely changed our life in this technological world and has made a world as a global village. Formerly, people used to send their message through the help of pigeons, then postman but now it has become very easy not to just send the message but also can have access to video call to the one you want to send the message. This is the internet along with smartphones which have made easier for every individual to connect himself with all his distant relatives around the world. Thus, it is the only technology which has made our lives easier than before. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13182) Innovative Ideas in the Field of f95 zone Technology
Abigail Cole created ARROW-13182: Summary: Innovative Ideas in the Field of f95 zone Technology Key: ARROW-13182 URL: https://issues.apache.org/jira/browse/ARROW-13182 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole Innovative ideas in the field of technology have simplified the work and helped our rapid development. These ideas contribute to the creation of innovative technologies over time. In order to create this innovative idea, it is necessary to have the knowledge, which is fundamental in this process. Thus we get the scheme: knowledge, idea, technology. To date, innovative technologies are traditionally divided into two segments: information technologies (technologies of automated information processing) and communication technologies (technologies for storage and transmission of information). For example, with the help of communication technologies, people can receive and transmit various contents, being in different corners of our world. International relations, including education, business negotiations and much more are now possible faster and more efficiently. If we recall the communication innovations in the field of education, first of all, it should be emphasized that people can enter higher education institutions and study remotely regardless of their location. Furthermore, every qualified pedagogue teaches something new and useful. Communication with representatives of other countries contributes to our self-development. All this eventually promotes the creation of qualified unique staff. Information technologies allow: - To automate certain labour-intensive operations; - Automate and optimize production planning; - Optimize individual business processes (for example, customer relations, asset management, document management, management decision-making), taking into account the specifics of various branches of economic activity. Information technology is used for large data processing systems, computing on a personal computer, in science and education, in management, computer-aided design and the creation of systems with artificial intelligence. Information technologies are the modern technological systems of immense strategic importance (political, defence, economic, social and cultural), which led to the formation of a new concept of [*f95 zone*|https://complextime.com/f95zone-what-is-it-and-why-use-it/] the world order - "who owns the information, he owns the world." The spread of information and communication technologies play an important role in structural changes in all the areas of our life. For someone, it will be difficult to learn these technologies. Workers who will not be able to study will have to give way to the younger generation. Thus we are faced with a problem because, in order to use innovations in technologies and develop it, it is necessary to have a qualified youth. First and foremost there is the question of education. Anyway, only education can create a developed generation that will continue to strive for new knowledge and will meet the requirements of innovative technologies. In addition, I am convinced that innovative ideas in technologies have created a completely new life, which poses new challenges for our country. How we will cope with these tasks depends on the future of our country. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13181) Big Data and Technology Services Market f 95 zone
Abigail Cole created ARROW-13181: Summary: Big Data and Technology Services Market f 95 zone Key: ARROW-13181 URL: https://issues.apache.org/jira/browse/ARROW-13181 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole Big data has been touted as the next massive transformation in global data analysis and management. Businesses around the globe have incorporated big data in their operations to make sense of the seeming myriad data generated on a consistent basis. The adoption of big data technology and services has grown at a robust pace among end-use industries. As big data becomes more mainstream, and integration with cloud and artificial intelligences becomes more streamlined, further growth is projected. According to a recently published report, the global big data technology and services market is poised to reach a valuation of over US$ 184 Bn. Data-Driven Decision Making Continues to Fuel Adoption of Big Data Technology and Services Over the years, there has been significant shift in how businesses make critical business decisions. Assumptions and traditional intelligence gathering has given way to fact-based, data-driven decision making which has furthered the cause for adopting big data solutions. The change in status-quo has been one of the key factors for the growing adoption of big data technology and services in various end-use industries. As more businesses are realizing the advantages of big data in decision-making, it is highly likely that adoption of big data technology and services will grow at a steady pace in the short- and long-term. The information big data analysis brings to the fore has also helped businesses bridge the challenges associated with agility and stakeholder empowerment. Businesses have traditionally faced an uphill task in terms of finding that elusive balance between agility and decentralization. Counting in everyone's opinion before making big decisions has been the utopian focus of businesses, however, it also comes with the risk of slowing down the decision-making process in a hyper-competitive environment. The RACI framework, which has been referred by businesses to reduce ambiguity on choosing the right authority on decision-making is becoming easier to navigate as access to data makes the entire decision-making process a seamless affair. Integration of Big Data with Traditional Business Intelligence - The Way Forward? Integration of big data technology and services with traditional business intelligence is being looked upon as the way forward for businesses focusing on quick fact-based decision making and improvement in customer experience. Business intelligence has been a reliable tool for companies to understand their target audience more intimately; however, the high turnaround time has remained an impediment. The incorporation of big data has mitigated this challenge to an extent, which in turn has fuelled adoption among end-users. In the future, it is highly likely that big data and business intelligence will become highly intertwined. Banking, Financial services and Insurance (BFSI) Industry Continues to be at Forefront of Adoption Although adoption of big data technology and service has been pervasive, BFSI sector has remained at the forefront of adoption since the early days of big data. The sheer volume of data generated on a daily basis in the BFSI industry has necessitated the adoption of a holistic data monitoring, gathering, and analysis solutions. Some of the key challenges that the BFSI sector currently faces include fraud identification, unorganized data, and operational inefficiency. The inclusion of big data technology and services has helped alleviate some of these challenges to a great extent. On the back of these improvements, there has been a significant penetration of big data in the BFSI sector. According to current estimates, revenues generated from adoption of big data technology and services are likely to reach over US$ 33 billion in terms of revenues by 2026. Inclusion of Big Data Technology and Services Gaining Ground in Healthcare Sector Big data has massive potential in the healthcare industry, with proponents touting benefits ranging from epidemic prediction and reduced cost of treatments. Although electronic health records (EHR) have been a staple in the healthcare sector for quite a while, their efficacy is limited to the medical history of patients. Big data, on the other hand, promises a comprehensive, holistic data analysis that can help healthcare providers in managing massive volume of data. The insights offered through inclusion of big data technology and services can help healthcare *[f 95 zone|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* providers improve the profitability, while improving the care received by
[jira] [Created] (ARROW-13180) Teaching With f95 Technology
Abigail Cole created ARROW-13180: Summary: Teaching With f95 Technology Key: ARROW-13180 URL: https://issues.apache.org/jira/browse/ARROW-13180 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole Teaching with technology helps to expand student learning by assistant instructional objectives. However, it can be thought-provoking to select the best technology tools while not losing sight of the goal for student learning. An expert can find creative and constructive ways to integrate technology into our class. What do we mean by technology? The term technology refers to the development of the techniques and tools we use to solve problems or achieve goals. Technology can encompass all kinds of tools from low-tech pencils, paper, a chalkboard to the use of presentation software, or high-tech tablets, online collaboration and conference tools and more. the newest technologies allow us to try things in physical and virtual classrooms that were not possible before. How can technology help students? Technology can help a student through the following ways: 1. Online collaboration tools: Technology has helped the students & instructors to share document online, editing of the document in real time and project them on a screen. This gives the students a collaborative platform in which to brainstorm ideas and document their work using text and pictures. 2. Presentation software: This enables the instructor to embed high-resolution photographs, diagrams, videos and sound files to augment the text and verbal lecture content. 3. Tablet: Here, tablets can be linked to computers, projectors, and cloud so that students and instructors can communicate through text, drawings, and diagrams. 4. Course management tools: This allows instructors to organize all the resources students' needs for the class. the syllabus, assignments, readings, online quizzes. 5. Smartphone: These are a quick and easy way to survey students during class. It is a great instant polling which can quickly access students understanding and help instructors to adjust pace and content 6. Lecture capture tools: The lecture capture tools allow instructors to record lectures directly from their computer without elaborate or additional classroom equipment. The record lectures at their own pace. Advantages of technology integration in the education sphere? The teaching strategies based on educational technology can be described as ethical that facilitates the students learning and boost their capacity, productivity, and performance. technology integration inspires positive changes in teaching methods on an international level. The following list of benefit will help in resolving a final conclusion: 1. Technology makes teaching easy: technology has power. It helps in the use of projectors and computer presentations to deliver *[f95|https://complextime.com/f95zone-what-is-it-and-why-use-it/]* any type of lesson or instruction and improve the level of comprehension within the class rather than giving theoretical explanations that students cannot understand. 2. It facilitates student progress: technology has made teachers rely on platforms and tools that enable you to keep track of individual achievements. 3. Education technology is good for the environment: if all schools have dedicated to being using digital textbooks, can you imagine the amount of paper and number of trees that will be saved. students can be instructed to take an online test and submit their papers and homework through email. They can be also encouraged to use readers to read through the literature assigned. 4. It has made students enjoy learning: students enjoy learning through their addiction to Facebook, Instagram, dig, and other websites from a very early age. the internet can distract them from the learning process. making learning enjoyable through setting up a private Facebook group for the class and inspire constructive conversations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13179) Impact of Smart Technology on Data friv4school
Abigail Cole created ARROW-13179: Summary: Impact of Smart Technology on Data friv4school Key: ARROW-13179 URL: https://issues.apache.org/jira/browse/ARROW-13179 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole With evolving smart technologies, the entire process of rendering data entry services has become way easier. Smart technologies are now helping businesses strategically and economically by generating data from every possible source including mobile phones, industrial equipment, smart accessories and personal computers. Data entry services are considered to be "smart" on their responsiveness with respect to the incoming data. Businesses are looking for effective ways to manage data for obtaining better value and supporting their ultimate objectives. Smart technologies tend to engage people and various smart devices with the related business, for better processing and collection of data from designated sources. For supporting and coping with the current evolution of such technologies, processes are being constantly renewed. There are various smart applications that enhance data analytics processes and make them even better. These include Cloud Computing, Internet of Things, Smart Data and Machine Learning. Need of Smart Technology Data entry services, when offered with smart technologies provide real-time data processing, thus improving business's economic growth and providing a business-friendly option with efficient data management. When looking for a suitable smart app for Nowadays, businesses are striving for more innovative strategies while incorporating these smart apps. It eradicates the need of paper documents. It provides innovation with a customer-centered approach. These technologies are all industry-oriented, providing accurate results These are scalable and easy-to-adopt. They work even better with unorganized data volumes. Collection of Data via Smart Technologies Smart technologies assist in collecting and assembling data through: Intelligent Capture replacing template-based data extraction with an efficient capturing module and natural language understanding. Mobile Data Entry for collecting data on various mobile devices, enabling smart data entry services. Robotic Process Automation (RPA) providing the latest smart recognition technology for improved data processing. Data Alteration through Smart Technologies For better use of these technologies, data entry services and methodologies are continuously being reshaped and revised, allowing organizations to take competitive advantage, along with enhancing cost-efficiency and security of business operations. Smart technologies include Artificial Intelligence, Machine Learning, Internet of Things have now replaced manual processes that are more time-consuming, providing lesser room for human errors. Let's talk about a few of these technologies: Artificial Intelligence and Machine Learning are more responsive and secure when it comes to managing any repetitive task, recognizing various patterns and enhancing the accuracy level. For expanding number of data sources and creating a connection between people, internet, devices and businesses, IOT (Internet of Things) is used extensively these days. >From cloud computing services based on data entry services, businesses can >derive benefit and manage the complexity of their data infrastructure. Effect of Intelligent Technologies Smart technologies are drastically casting a positive impact over data entry services and rendering a friendlier approach, providing benefits in the following ways: Better and more composed process, leading to reduction of human errors. It has become faster and more efficient with easy management of data in bulk and from different sources like paper forms, scanned images and much more. Streamlining the business operations [*friv4school*|https://complextime.com/friv-everything-you-need-to-know-about-it/] and changing the perception of businesses to deal with data management projects. Increasing the potential to scale data entry processes and utilize innovative techniques. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13178) Disruptive Technologies firv
Abigail Cole created ARROW-13178: Summary: Disruptive Technologies firv Key: ARROW-13178 URL: https://issues.apache.org/jira/browse/ARROW-13178 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole I am not into technologies, those that change so ever fast, and always. But I do observe technological trends, along which the development of scientific applications revolves. And of all trends, perhaps disruptive technologies are the defining path of industrial implications, a linear passage that technological progress almost invariably follows. Though the concept of "disruptive technologies" is only popularized in 1997 by Harvard Business School Professor Clayton Christensen in his best-seller "The Innovator's Dilemma", the phenomenon was already evidenced back in 1663, when Edward Somerset published designs for, and might have installed, a steam engine. As put forth by Clayton Christensen, disruptive technologies are initially low performers of poor profit margins, targeting only a minute sector of the market. However, they often develop faster than industry incumbents and eventually outpace the giants to capture significant market shares as their technologies, cheaper and more efficient, could better meet prevailing consumers' demands. In this case, the steam engines effectively displaced horse power. The demand for steam engines was not initially high, due to the then unfamiliarity to the invention, and the ease of usage and availability of horses. However, as soon as economic activities intensified, and societies prospered, a niche market for steam engines quickly developed as people wanted modernity and faster transportation. One epitome of modern disruptive technologies is Napster, a free and easy music sharing program that allows users to distribute any piece of recording online. The disruptee here is conventional music producers. Napster relevantly identified the "non-market", the few who wanted to share their own music recordings for little commercial purpose, and thus provided them with what they most wanted. Napster soon blossomed and even transformed the way the internet was utilized. Nevertheless, there are more concerns in the attempt to define disruptive technologies than simply the definition itself. One most commonly mistaken feature for disruptive technologies is sustaining technologies. While the former brings new technological innovation, the latter refers to "successive incremental improvements to performance" incorporated into existing products of market incumbents. Sustaining technologies could be radical, too; the new improvements could herald the demise of current states of production, like how music editor softwares convenience Napster users in music customization and sharing, thereby trumping over traditional whole-file transfers. The music editors are part of a sustaining technological to Napster, not a new disruptor. Thus, disruptive and sustaining technologies could thrive together, until the next wave of disruption comes. See how music editors are linked to steam engines? Not too close, but each represents one aspect of the twin engines that drive progressive technologies; disruptors breed sustainers, and sustainers feed disruptors. This character of sustaining technologies brings us to another perspective of disruptive technologies: they not only change the way people do business, but also initiate a fresh wave of follow-up technologies that propel the disruptive technology to success. Sometimes, sustaining technologies manage to carve out a niche market for its own even when the disruptive initiator has already shut down. Music editor and maker softwares continue to healthily thrive, despite Napster's breakdown (though many other file sharing services are functioning by that time), with products like the AV Music Morpher Gold and Sound Forge 8. A disruptive technology is also different from a paradigm shift, which Thomas Kuhn used to describe "the process and result of a change in basic assumptions within the ruling theory of science". In disruptive technologies, there are no assumptions, but only the rules of game of which the change is brought about by the behaviors of market incumbents and new entrants. They augment different markets that eventually merge. In Clayton Christensen's words, newcomers to the industry almost invariably "crush the incumbents". While researching on disruptive [*firv*|https://complextime.com/friv-everything-you-need-to-know-about-it/] technologies, I came across this one simple line that could adequately capture what these technologies are about, "A technology that no one in business wants but that goes on to be a trillion-dollar industry." Interesting how a brand new technology that seemingly bears little value could shake up an entire industry, isn't it?
[jira] [Created] (ARROW-13177) Technology Acceptance juego friv Model
Abigail Cole created ARROW-13177: Summary: Technology Acceptance juego friv Model Key: ARROW-13177 URL: https://issues.apache.org/jira/browse/ARROW-13177 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole Advances in computing and information technology are changing the way people meet and communicate. People can meet, talk, and work together outside traditional meeting and office spaces. For instance, with the introduction of software designed to help people schedule meetings and facilitate decision or learning processes, is weakening geographical constraints and changing interpersonal communication dynamics. Information technology is also dramatically affecting the way people teach and learn. As new information technologies infiltrate workplaces, home, and classrooms, research on user acceptance of new technologies has started to receive much attention from professionals as well as academic researchers. Developers and software industries are beginning to realize that lack of user acceptance of technology can lead to loss of money and resources. In studying user acceptance and use of technology, the TAM is one of the most cited models. The Technology Acceptance Model (TAM) was developed by Davis to explain computer-usage behavior. The theoretical basis of the model was Fishbein and Ajzen's Theory of Reasoned Action (TRA). The Technology Acceptance Model (TAM) is an information systems (System consisting of the network of all communication channels used within an organization) theory that models how users come to accept and use a technology, The model suggests that when users are presented with a new software package, a number of factors influence their decision about how and when they will use it, notably: Perceived usefulness (PU) - This was defined by Fred Davis as "the degree to which a person believes that using a particular system would enhance his or her job performance". Perceived ease-of-use (PEOU) Davis defined this as "the degree to which a person believes that using a particular system would be free from effort" (Davis, 1989). The goal of TAM is "to provide an explanation of the determinants of computer acceptance that is general, capable of explaining user behavior across a broad range of end-user computing technologies and user populations, while at the same time being both parsimonious and theoretically justified". According to the TAM, if a user perceives a specific technology as useful, she/he will believe in a positive use-performance relationship. Since effort is a finite resource, a user is likely to accept an application when she/he perceives it as easier to use than another .As a consequence, educational technology with a high level of PU and PEOU is more likely to induce positive perceptions. The relation between PU and PEOU is that PU mediates the effect of PEOU on attitude and intended use. In other words, while PU has direct impacts on attitude and use, PEOU influences attitude and use indirectly through PU. User acceptance is defined as "the demonstrable willingness within a user group to employ information technology for the tasks it is designed to support" (Dillon & Morris). Although this definition focuses on planned and intended uses of technology, studies report that individual perceptions of information technologies are likely to be influenced by the objective characteristics of technology, as well as interaction with other users. For example, the extent to which one evaluates new technology as useful, she/he is likely to use it. At the same time, her/his perception of the system is influenced by the way people around her/him evaluate and use the system. Studies on information technology continuously report that user attitudes are important factors affecting the success of the system. For the past several decades, many definitions of attitude have been proposed. However, all theories consider attitude to be a relationship between a person and an [*juego friv*|https://complextime.com/friv-everything-you-need-to-know-about-it/] ** object (Woelfel, 1995). In the context of information technologies, is an approach to the study of attitude - the technology acceptance model (TAM). TAM suggests users formulate a positive attitude toward the technology when they perceive the technology to be useful and easy to use (Davis, 1989). A review of scholarly research on IS acceptance and usage suggests that TAM has emerged as one of the most influential models in this stream of research The TAM represents an important theoretical contribution toward understanding IS usage and IS acceptance behaviors. However, this model -- with its original emphasis on the design of system characteristics - does not account for social influence in the adoption and utilization of new information systems. --
[jira] [Created] (ARROW-13176) T Is for Technology in Triathlon Training
Abigail Cole created ARROW-13176: Summary: T Is for Technology in Triathlon Training Key: ARROW-13176 URL: https://issues.apache.org/jira/browse/ARROW-13176 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole The original triathletes were amazing. Dave Scott and Mark Allen accomplished amazing feats in triathlon long before technology took over the sport. They didn't have metrics like we have today and they certainly didn't have all of the information gathering abilities we have. Yet, they set records and competed valiantly. In fact Mark Allen still holds the marathon record in Kona to this day. Technology is a great friend to triathletes but is does have a downside. TECHNOLOGY ITEMS So technology has taken over every part of triathlon. One of the most widely researched areas is the area of the triathlon watch. Each and every year there are new watches available for purchase that have ever increasing measurements for the triathlete. My personal favorite is the Garmin 910XT. This watch gives me heart rate, power (with a power meter), pacing (with optional foot pod), speed, cadence (with optional cadence sensor), mileage, yards in swimming, and much more. Each of these measurements aid me in measuring my success or failures in each and every training session and race. Technology has been making huge strides in bicycles and wheel sets. The amount of research going into these two items within the world of triathlon is incredible. Each and every year there are new and exciting advances in aerodynamic speed in bicycles and wheel sets. Much of the time these technologies can take on two very different vantage points. This was most evident at the 2016 World Championships in Kona. Diamond Bikes unveiled their Andean bike which fills in all the space in between the front tire and the back tire with a solid piece to make the wind pass by this area for aerodynamics. Another bike debuted at Kona this year with the exact opposite idea. The Ventum bike eliminated the down tube of the bike and made a vacant space in between the front tire and the back tire with only the top tube remaining. These are two very different ideas about aerodynamics. This is one of the amazing things about the advancement of technology and one of the downsides as well. Each and every piece of equipment in triathlon is undergoing constant technology advancements. Shoes, wetsuits, socks, nutrition, hats, sunglasses, helmets, racing kits, and anything else you can imagine. This world of technology in triathlon is not near to completion and will continue to push the limits. THE UPSIDE TO TECHNOLOGY Technology in triathlon is amazing. These new items are exciting and make each and every year different. There are new advancements that help triathletes go faster and longer. These new technologies help even the amateur triathlete to go faster. Just the purchase of new wheels can mean the difference between being on or off the podium. The advancement of shoes has aided many athletes to avoid the injuries that plague so many such as plantar fasciitis. Technology will continue to aid the sport in becoming better and better. THE DOWNSIDE TO TECHNOLOGY The downside to technology is that the amateur triathlete arrives at their local race already incapable of winning because someone else has the money to buy some of the latest technology. The biggest purchases such as wheel sets and bicycles can be cost prohibitive to the average triathlete and yet *[friv.com|https://complextime.com/friv-everything-you-need-to-know-about-it/]* there are individuals who purchase these items at alarming rates. The amateur triathlete can also feel overwhelmed at what to purchase and what not to purchase. Some items of technology are not worth the extra cost because they do not decrease racing time significantly enough for what they cost. Now that these new technologies have been out awhile, knock-offs have begun to make lower cost items. It will be interesting to watch the flood of these knock-offs into the market and see how that affects the big boys of technology. If you are an amateur triathlete shop smart and don't go buy the new gadgets just because they are new. Make sure to invest in items that are going to truly make you faster and not just a gimmick. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13175) Technology Trends That Will f9zone Dominate 2017
Abigail Cole created ARROW-13175: Summary: Technology Trends That Will f9zone Dominate 2017 Key: ARROW-13175 URL: https://issues.apache.org/jira/browse/ARROW-13175 Project: Apache Arrow Issue Type: Bug Reporter: Abigail Cole Technology has remarkably changed the way we live today, there is no denial to it. Compared with our ancestors, we stand far away from them in using different technologies for our day-to-day works. So many technologies are developed in the past couple of years that have revolutionized our lives, and it's impossible to list each of them. Though technology changes fast with time, we can observe the trends in which it changes. Last year, 2016 had bought so many fresh innovative ideas and creations towards automation and integration etc., and this year 2017 is expected to continue the similar kind of trend. In this article, we are going to discuss some of the notable trends for this year, which will make us look beyond the horizon. Gartner's 2016 Hype Cycle for emerging technologies have identified different technologies that will be trending this year. The cycle illustrates the fact how technology innovations are redefining the relations between the customer and marketer. This year, Gartner has identified Blockchains, Connected Homes, Cognitive Expert Advisors, Machine Learning, Software-defined Security etc. as the overarching technology trends, which have the potential of reshaping the business models and offering enterprises the definite route to emerging markets and ecosystems. #1. Blockchain Popularly known as 'Distributed Ledger Technology' for both financial and non-financial transactions, is one of the mystifying concepts that technologists could only understand to the fullest. Various advancements in blockchain have helped many people and more businesses in 2016, to experience its potential in banking and finance industry. This year, it is anticipated that blockchain technology would go beyond just banking sector, helping the start-ups and established businesses to address the market needs with different application offerings. #2. Internet of Things & Smart Home Tech With the advent of IoT, we are already eyeing the world of inter-connected things, aren't we? Our dreams of living in smart homes are met to a certain extent in 2016. So, what is stopping us from fulfilling our dreams of living in smart connected homes? Well, the fact is that the market is full of abundant individual appliances and apps, but only a little amount of solutions integrate them into a single, inclusive user experience. It is anticipated that 2017 will notice this trend to undergo a big step towards fulfilling our dreams. #3. Artificial Intelligence & Machine Learning In the recent times, Artificial Intelligence and Machine Learning have taken the entire world by storm with its amazing inventions and innovative technologies. By observing the on-going advancements in this field, it will be no longer an imagination to experience the world where robots and machine will dominate the society. Last year, we have witnessed the rise of ML algorithms on almost all major e-commerce portals and its associated mobile apps, which is further expected to spread across on all social networking platforms, dating websites, and matrimonial websites in 2017. #4. Software-defined Security In 2016, we have observed a significant growth for increased server security. Many organizations have started recognizing the significance of cybersecurity to enable their move of emerging as digital businesses. The growth of cloud-based infrastructure is causing a great demand for managing unstructured data, and moreover, the lack of technical expertise and threat to data security, are the key factors hindering the substantial growth of software-defined security market this year. #5. Automation Automation will be the mainstay throughout 2017, the coming years will be transformative for IT industry, enabling the automation of human performed tasks. When Machine Learning is combined with automation, the marketers are likely to witness wide business opportunities with enriched market results. #6. Augmented Reality (AR) & Virtual Reality (VR) AR and VR transform the way users interact with each other and software systems. The year 2016 has experienced path-breaking steps in AR and VR technology. With the launch of Oculus Rift, the market had received an overwhelming response from the users, making way to a plethora of VR-based apps and games. Further, when Pokémon Go was released, it has completely re-defined the definition of gaming experience. It was one of the most profitable and downloaded the mobile application of 2016. The response AR and VR technology has received last year was farfetched, and it forecasts that the world is ready to adopt this
[jira] [Updated] (ARROW-13174) [C+][Compute] Add strftime kernel
[ https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-13174: -- Description: To convert timestamps to a string representation with an arbitrary format we require a strftime kernel (the inverse operation of the {{strptime}} kernel we already have). See [comments here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. was: To express timestamps with arbitrary format we require a strftime kernel. See [comments here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. > [C+][Compute] Add strftime kernel > - > > Key: ARROW-13174 > URL: https://issues.apache.org/jira/browse/ARROW-13174 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > To convert timestamps to a string representation with an arbitrary format we > require a strftime kernel (the inverse operation of the {{strptime}} kernel > we already have). > See [comments > here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12744) [C++][Compute] Add rounding kernel
[ https://issues.apache.org/jira/browse/ARROW-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369415#comment-17369415 ] Eduardo Ponce commented on ARROW-12744: --- A draft PR is available that implements *round* function as a unary scalar function. It outputs float64 for integral inputs and matching type for floating-point inputs. Rounding behavior is controlled via 2 option controls, a rounding mode (specifies displacement behavior) and a multiple (scale and precision). Feedback is welcomed w.r.t. to implementation, rounding options and names, and documentation. > [C++][Compute] Add rounding kernel > -- > > Key: ARROW-12744 > URL: https://issues.apache.org/jira/browse/ARROW-12744 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Ian Cook >Assignee: Eduardo Ponce >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Kernel to round an array of floating point numbers. Should return an array of > the same type as the input. Should have an option to control how many digits > after the decimal point (default value 0 meaning round to the nearest > integer). > Midpoint values (e.g. 0.5 rounded to nearest integer) should round away from > zero (up for positive numbers, down for negative numbers). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-13174) [C+][Compute] Add strftime kernel
[ https://issues.apache.org/jira/browse/ARROW-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc reassigned ARROW-13174: -- Assignee: Rok Mihevc > [C+][Compute] Add strftime kernel > - > > Key: ARROW-13174 > URL: https://issues.apache.org/jira/browse/ARROW-13174 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Major > > To express timestamps with arbitrary format we require a strftime kernel. > See [comments > here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-13133) [R] Add support for locale-specific day of week (and month of year?) returns from timestamp accessor functions
[ https://issues.apache.org/jira/browse/ARROW-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369370#comment-17369370 ] Rok Mihevc commented on ARROW-13133: https://issues.apache.org/jira/browse/ARROW-13174 > [R] Add support for locale-specific day of week (and month of year?) returns > from timestamp accessor functions > -- > > Key: ARROW-13133 > URL: https://issues.apache.org/jira/browse/ARROW-13133 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Nic Crane >Priority: Major > > The R binding for the wday date accessor added in this PR > [https://github.com/apache/arrow/pull/10507] currently doesn't support > returning the string representation of the day of the week (e.g. "Mon") and > only supports the numeric representation (e.g. 1). > We should implement this, though discussion should be had about whether this > belongs at the R or C++ level. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-13133) [R] Add support for locale-specific day of week (and month of year?) returns from timestamp accessor functions
[ https://issues.apache.org/jira/browse/ARROW-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369370#comment-17369370 ] Rok Mihevc edited comment on ARROW-13133 at 6/25/21, 10:09 AM: --- ARROW-13174 was (Author: rokm): https://issues.apache.org/jira/browse/ARROW-13174 > [R] Add support for locale-specific day of week (and month of year?) returns > from timestamp accessor functions > -- > > Key: ARROW-13133 > URL: https://issues.apache.org/jira/browse/ARROW-13133 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Nic Crane >Priority: Major > > The R binding for the wday date accessor added in this PR > [https://github.com/apache/arrow/pull/10507] currently doesn't support > returning the string representation of the day of the week (e.g. "Mon") and > only supports the numeric representation (e.g. 1). > We should implement this, though discussion should be had about whether this > belongs at the R or C++ level. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13174) [C+][Compute] Add strftime kernel
Rok Mihevc created ARROW-13174: -- Summary: [C+][Compute] Add strftime kernel Key: ARROW-13174 URL: https://issues.apache.org/jira/browse/ARROW-13174 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Rok Mihevc To express timestamps with arbitrary format we require a strftime kernel. See [comments here|https://github.com/apache/arrow/pull/10598#issuecomment-868358466]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13173) [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally
Yibo Cai created ARROW-13173: Summary: [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally Key: ARROW-13173 URL: https://issues.apache.org/jira/browse/ARROW-13173 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 4.0.1, 4.0.0 Reporter: Yibo Cai Observed one test case failure from Travis CI arm64 job. https://travis-ci.com/github/apache/arrow/jobs/518630381#L2271 {{TestAsyncUtil.ReadaheadFailed}} asserted at https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_generator_test.cc#L1131 Looks _SleepABit()_ cannot guarantee that _finished_ will be set in time, especially on busy CI hosts where many jobs share one machine. cc [~westonpace] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13173) [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally
[ https://issues.apache.org/jira/browse/ARROW-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibo Cai updated ARROW-13173: - Fix Version/s: 5.0.0 > [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally > - > > Key: ARROW-13173 > URL: https://issues.apache.org/jira/browse/ARROW-13173 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 4.0.0, 4.0.1 >Reporter: Yibo Cai >Priority: Major > Fix For: 5.0.0 > > > Observed one test case failure from Travis CI arm64 job. > https://travis-ci.com/github/apache/arrow/jobs/518630381#L2271 > {{TestAsyncUtil.ReadaheadFailed}} asserted at > https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_generator_test.cc#L1131 > Looks _SleepABit()_ cannot guarantee that _finished_ will be set in time, > especially on busy CI hosts where many jobs share one machine. > cc [~westonpace] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13172) Make TYPE_WIDTH in Vector public
[ https://issues.apache.org/jira/browse/ARROW-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13172: --- Labels: pull-request-available (was: ) > Make TYPE_WIDTH in Vector public > > > Key: ARROW-13172 > URL: https://issues.apache.org/jira/browse/ARROW-13172 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Eduard Tudenhoefner >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Some Vector classes already expose the TYPE_WIDTH publicly. It would be > helpful if all Vectors would do that -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13172) Make TYPE_WIDTH in Vector public
Eduard Tudenhoefner created ARROW-13172: --- Summary: Make TYPE_WIDTH in Vector public Key: ARROW-13172 URL: https://issues.apache.org/jira/browse/ARROW-13172 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Eduard Tudenhoefner Some Vector classes already expose the TYPE_WIDTH publicly. It would be helpful if all Vectors would do that -- This message was sent by Atlassian Jira (v8.3.4#803005)