[jira] [Resolved] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-29 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer resolved ARROW-18364.
-
Resolution: Fixed

Resolved by https://github.com/apache/arrow/pull/14675

> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-22-11-53-20-840.png, 
> image-2022-11-22-11-55-17-106.png
>
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.
> It's also worth noting now that GitHub issue reporting is enabled that issues 
> cannot be resolved in a way that explicitly tracks the version where the 
> resolution was made, if the issue is tracked only in GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18375) MIGRATION: Enable GitHub issue type labels

2022-11-25 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18375:

Description: 
As part of enabling GitHub issue reporting, the following labels have been 
defined and need to be added to the repository label options. Without these 
labels added, [new issues|https://github.com/apache/arrow/issues/14692] do not 
get the issue template-defined issue type labels set properly.

 

Labels:
 * Type: bug
 * Type: enhancement
 * Type: usage
 * Type: task
 * Type: test



 

  was:
As part of enabling GitHub issue reporting, the following labels have been 
defined and need to be added to the repository label options. Without these 
labels added, [new issues|https://github.com/apache/arrow/issues/14692] do not 
get the issue template-defined issue type labels set properly.

 

Labels:
 * Type: bug
 * Type: enhancement
 * Type: usage

 


> MIGRATION: Enable GitHub issue type labels
> --
>
> Key: ARROW-18375
> URL: https://issues.apache.org/jira/browse/ARROW-18375
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
>
> As part of enabling GitHub issue reporting, the following labels have been 
> defined and need to be added to the repository label options. Without these 
> labels added, [new issues|https://github.com/apache/arrow/issues/14692] do 
> not get the issue template-defined issue type labels set properly.
>  
> Labels:
>  * Type: bug
>  * Type: enhancement
>  * Type: usage
>  * Type: task
>  * Type: test
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18378) MIGRATION: Disable issue reporting in ASF Jira

2022-11-23 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637963#comment-17637963
 ] 

Todd Farmer commented on ARROW-18378:
-

Vote proposed on [this mailing list 
thread|https://lists.apache.org/thread/v9sjwx8mdg0bfssbrlqz7c0wxwc8dx49].

> MIGRATION: Disable issue reporting in ASF Jira
> --
>
> Key: ARROW-18378
> URL: https://issues.apache.org/jira/browse/ARROW-18378
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>
> ARROW-18364 enabled issue reporting for Apache Arrow in GitHub issues. Even 
> though existing Jira issues have not yet been migrated and are still being 
> worked in the Jira system, we should assess disabling creation of new issues 
> in ASF Jira, and instead pointing users to GitHub issues. This may benefit 
> the project by reducing the need to monitor inflow in two discrete systems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18378) MIGRATION: Disable issue reporting in ASF Jira

2022-11-23 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637946#comment-17637946
 ] 

Todd Farmer commented on ARROW-18378:
-

This topic was raised on the Arrow Biweekly Sync call today, and agreement was 
reached to make a proposal on the mailing list for voting. With about 2 months 
remaining until the next planned release, we should have time to allocate to 
establishing best practices and updating merg/release tooling to leverage 
GitHub instead of Jira. Disabling issue reporting in Jira will be a forcing 
function of sorts by having all users (not just new users not yet registered in 
ASF Jira) working with GitHub issues, identifying and improving the system and 
practices incrementally.

> MIGRATION: Disable issue reporting in ASF Jira
> --
>
> Key: ARROW-18378
> URL: https://issues.apache.org/jira/browse/ARROW-18378
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
>
> ARROW-18364 enabled issue reporting for Apache Arrow in GitHub issues. Even 
> though existing Jira issues have not yet been migrated and are still being 
> worked in the Jira system, we should assess disabling creation of new issues 
> in ASF Jira, and instead pointing users to GitHub issues. This may benefit 
> the project by reducing the need to monitor inflow in two discrete systems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-18378) MIGRATION: Disable issue reporting in ASF Jira

2022-11-23 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-18378:
---

Assignee: Todd Farmer

> MIGRATION: Disable issue reporting in ASF Jira
> --
>
> Key: ARROW-18378
> URL: https://issues.apache.org/jira/browse/ARROW-18378
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>
> ARROW-18364 enabled issue reporting for Apache Arrow in GitHub issues. Even 
> though existing Jira issues have not yet been migrated and are still being 
> worked in the Jira system, we should assess disabling creation of new issues 
> in ASF Jira, and instead pointing users to GitHub issues. This may benefit 
> the project by reducing the need to monitor inflow in two discrete systems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-22 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637424#comment-17637424
 ] 

Todd Farmer edited comment on ARROW-18364 at 11/22/22 11:21 PM:


(Moved previous comment regarding milestones to [this 
issue|https://issues.apache.org/jira/browse/ARROW-18381?focusedCommentId=17637500]
 for better tracking.)


was (Author: JIRAUSER288796):
(Moved previous comment regarding milestones to this issue for better tracking.)

> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-22-11-53-20-840.png, 
> image-2022-11-22-11-55-17-106.png
>
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.
> It's also worth noting now that GitHub issue reporting is enabled that issues 
> cannot be resolved in a way that explicitly tracks the version where the 
> resolution was made, if the issue is tracked only in GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-22 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637424#comment-17637424
 ] 

Todd Farmer edited comment on ARROW-18364 at 11/22/22 11:21 PM:


(Moved previous comment regarding milestones to this issue for better tracking.)


was (Author: JIRAUSER288796):
(Moved previous comment regarding milestones to this issue for better tracking.)

> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-22-11-53-20-840.png, 
> image-2022-11-22-11-55-17-106.png
>
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.
> It's also worth noting now that GitHub issue reporting is enabled that issues 
> cannot be resolved in a way that explicitly tracks the version where the 
> resolution was made, if the issue is tracked only in GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-22 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637424#comment-17637424
 ] 

Todd Farmer edited comment on ARROW-18364 at 11/22/22 11:20 PM:


(Moved previous comment regarding milestones to this issue for better tracking.)


was (Author: JIRAUSER288796):
While possible to map existing ASF Jira Apache Arrow project versions to GitHub 
milestones, there are some decisions needed to be made. The existing ASF Jira 
Apache Arrow versions can be seen via API 
[here|https://issues.apache.org/jira/rest/api/2/project/ARROW/version], and 
I've done an initial import to a personal GitHub repo, which can be seen via 
API 
[here|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=all].
 Some notes:
h2. Milestone name

This is set to the version pulled from ASF Jira. It is not prefixed with "v" 
(e.g., "v1.0.0") or anything. There exist some [80+ Jira issues with 
fixVersions prefixed with 
"JS-"|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20fixVersion%20IN(%20JS-0.3.0%2C%20JS-0.3.1%2C%20JS-0.4.0%2C%20JS-0.4.1)].
 To retain this metadata, I presume we will want to create corresponding 
milestones. Everything else appears to have reasonable/expected numeric values 
for version name.
h2.  
h2. Milestone status

GitHub supports "open" or "closed" milestone statuses. ASF Jira versions have 
discrete "archived" and "released" boolean fields. I propose mapping anything 
that is either archived or released to a "closed" milestone status. The rest 
will be set "open". Currently, this maps the following as 
[open|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=open]:
 * 9.0.1
 * 10.0.1
 * 10.0.2
 * 11.0.0
 * 12.0.0

 
h2. Milestone date metadata

GitHub milestones support a "due on" date. This is the only date field that can 
be set during [creation or update 
operations|https://docs.github.com/en/rest/issues/milestones]. GitHub also 
tracks dates that milestones were created, last updated, and closed - but those 
cannot be explicitly set. This is a little annoying, because these seem to be 
what get displayed in the web interface for closed milestones:

!image-2022-11-22-11-53-20-840.png!

Open milestones reference the "due on" date, which for open milestones, is not 
yet defined:

!image-2022-11-22-11-55-17-106.png!

While I am using the Jira version releaseDate field to map to the "due on" 
field in the corresponding GitHub project milestone, it's not clear that this 
is ever displayed or useful in the context of web views.

 
h2. Multiple fix versions

Some 314 Apache Arrow Jira issues are associated with multiple fix versions. 
GitHub does not allow associating an issue with multiple milestones; adding a 
second milestone replaces the existing milestone association. This means that 
314 legacy issues will lose metadata associating with at least one version that 
it was associated with in Jira. Is this acceptable? Should the lowest or 
highest associated Jira version be used during import, if so?

> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-22-11-53-20-840.png, 
> image-2022-11-22-11-55-17-106.png
>
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.
> It's also worth noting now that GitHub issue reporting is enabled that issues 
> cannot be resolved in a way that explicitly tracks the version where the 
> resolution was made, if the issue is tracked only in GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18381) MIGRATION: Create milestones for every needed fix version

2022-11-22 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18381:

Attachment: Screenshot from 2022-11-22 11-53-07.png

> MIGRATION: Create milestones for every needed fix version
> -
>
> Key: ARROW-18381
> URL: https://issues.apache.org/jira/browse/ARROW-18381
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
> Attachments: Screenshot from 2022-11-22 11-53-07.png, Screenshot from 
> 2022-11-22 11-54-26.png
>
>
> The Apache Arrow projects uses the "Fix version" field in ASF Jira issue to 
> track the version in which issues were resolved/fixed/implemented. The most 
> equivalent field in GitHub issues is the "milestone" field. This field is 
> explicitly managed - the versions need to be added to the repository 
> configuration before they can be used. This mapping needs to be established 
> as a prerequisite for completing the import from ASF Jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18381) MIGRATION: Create milestones for every needed fix version

2022-11-22 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637500#comment-17637500
 ] 

Todd Farmer commented on ARROW-18381:
-

While possible to map existing ASF Jira Apache Arrow project versions to GitHub 
milestones, there are some decisions needed to be made. The existing ASF Jira 
Apache Arrow versions can be seen via API 
[here|https://issues.apache.org/jira/rest/api/2/project/ARROW/version], and 
I've done an initial import to a personal GitHub repo, which can be seen via 
API 
[here|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=all].
 Some notes:
h2. Milestone name

This is set to the version pulled from ASF Jira. It is not prefixed with "v" 
(e.g., "v1.0.0") or anything. There exist some [80+ Jira issues with 
fixVersions prefixed with 
"JS-"|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20fixVersion%20IN(%20JS-0.3.0%2C%20JS-0.3.1%2C%20JS-0.4.0%2C%20JS-0.4.1)].
 To retain this metadata, I presume we will want to create corresponding 
milestones. Everything else appears to have reasonable/expected numeric values 
for version name.
h2. Milestone status

GitHub supports "open" or "closed" milestone statuses. ASF Jira versions have 
discrete "archived" and "released" boolean fields. I propose mapping anything 
that is either archived or released to a "closed" milestone status. The rest 
will be set "open". Currently, this maps the following as 
[open|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=open]:
 * 9.0.1
 * 10.0.1
 * 10.0.2
 * 11.0.0
 * 12.0.0

h2. Milestone date metadata

GitHub milestones support a "due on" date. This is the only date field that can 
be set during [creation or update 
operations|https://docs.github.com/en/rest/issues/milestones]. GitHub also 
tracks dates that milestones were created, last updated, and closed - but those 
cannot be explicitly set. This is a little annoying, because these seem to be 
what get displayed in the web interface for closed milestones:

!Screenshot from 2022-11-22 11-53-07.png!

Open milestones reference the "due on" date, which for open milestones, is not 
yet defined:

!Screenshot from 2022-11-22 11-54-26.png!

While I am using the Jira version releaseDate field to map to the "due on" 
field in the corresponding GitHub project milestone, it's not clear that this 
is ever displayed or useful in the context of web views.
h2. Multiple fix versions

Some 314 Apache Arrow Jira issues are associated with multiple fix versions. 
GitHub does not allow associating an issue with multiple milestones; adding a 
second milestone replaces the existing milestone association. This means that 
314 legacy issues will lose metadata associating with at least one version that 
it was associated with in Jira. Is this acceptable? Should the lowest or 
highest associated Jira version be used during import, if so?

> MIGRATION: Create milestones for every needed fix version
> -
>
> Key: ARROW-18381
> URL: https://issues.apache.org/jira/browse/ARROW-18381
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
> Attachments: Screenshot from 2022-11-22 11-53-07.png, Screenshot from 
> 2022-11-22 11-54-26.png
>
>
> The Apache Arrow projects uses the "Fix version" field in ASF Jira issue to 
> track the version in which issues were resolved/fixed/implemented. The most 
> equivalent field in GitHub issues is the "milestone" field. This field is 
> explicitly managed - the versions need to be added to the repository 
> configuration before they can be used. This mapping needs to be established 
> as a prerequisite for completing the import from ASF Jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18381) MIGRATION: Create milestones for every needed fix version

2022-11-22 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18381:

Attachment: Screenshot from 2022-11-22 11-54-26.png

> MIGRATION: Create milestones for every needed fix version
> -
>
> Key: ARROW-18381
> URL: https://issues.apache.org/jira/browse/ARROW-18381
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
> Attachments: Screenshot from 2022-11-22 11-53-07.png, Screenshot from 
> 2022-11-22 11-54-26.png
>
>
> The Apache Arrow projects uses the "Fix version" field in ASF Jira issue to 
> track the version in which issues were resolved/fixed/implemented. The most 
> equivalent field in GitHub issues is the "milestone" field. This field is 
> explicitly managed - the versions need to be added to the repository 
> configuration before they can be used. This mapping needs to be established 
> as a prerequisite for completing the import from ASF Jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-22 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637424#comment-17637424
 ] 

Todd Farmer edited comment on ARROW-18364 at 11/22/22 7:02 PM:
---

While possible to map existing ASF Jira Apache Arrow project versions to GitHub 
milestones, there are some decisions needed to be made. The existing ASF Jira 
Apache Arrow versions can be seen via API 
[here|https://issues.apache.org/jira/rest/api/2/project/ARROW/version], and 
I've done an initial import to a personal GitHub repo, which can be seen via 
API 
[here|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=all].
 Some notes:
h2. Milestone name

This is set to the version pulled from ASF Jira. It is not prefixed with "v" 
(e.g., "v1.0.0") or anything. There exist some [80+ Jira issues with 
fixVersions prefixed with 
"JS-"|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20fixVersion%20IN(%20JS-0.3.0%2C%20JS-0.3.1%2C%20JS-0.4.0%2C%20JS-0.4.1)].
 To retain this metadata, I presume we will want to create corresponding 
milestones. Everything else appears to have reasonable/expected numeric values 
for version name.
h2.  
h2. Milestone status

GitHub supports "open" or "closed" milestone statuses. ASF Jira versions have 
discrete "archived" and "released" boolean fields. I propose mapping anything 
that is either archived or released to a "closed" milestone status. The rest 
will be set "open". Currently, this maps the following as 
[open|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=open]:
 * 9.0.1
 * 10.0.1
 * 10.0.2
 * 11.0.0
 * 12.0.0

 
h2. Milestone date metadata

GitHub milestones support a "due on" date. This is the only date field that can 
be set during [creation or update 
operations|https://docs.github.com/en/rest/issues/milestones]. GitHub also 
tracks dates that milestones were created, last updated, and closed - but those 
cannot be explicitly set. This is a little annoying, because these seem to be 
what get displayed in the web interface for closed milestones:

!image-2022-11-22-11-53-20-840.png!

Open milestones reference the "due on" date, which for open milestones, is not 
yet defined:

!image-2022-11-22-11-55-17-106.png!

While I am using the Jira version releaseDate field to map to the "due on" 
field in the corresponding GitHub project milestone, it's not clear that this 
is ever displayed or useful in the context of web views.

 
h2. Multiple fix versions

Some 314 Apache Arrow Jira issues are associated with multiple fix versions. 
GitHub does not allow associating an issue with multiple milestones; adding a 
second milestone replaces the existing milestone association. This means that 
314 legacy issues will lose metadata associating with at least one version that 
it was associated with in Jira. Is this acceptable? Should the lowest or 
highest associated Jira version be used during import, if so?


was (Author: JIRAUSER288796):
While possible to map existing ASF Jira Apache Arrow project versions to GitHub 
milestones, there are some decisions needed to be made. The existing ASF Jira 
Apache Arrow versions can be seen via API 
[here|https://issues.apache.org/jira/rest/api/2/project/ARROW/version], and 
I've done an initial import to a personal GitHub repo, which can be seen via 
API 
[here|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=all].
 Some notes:
h2. Milestone name

This is set to the version pulled from ASF Jira. It is not prefixed with "v" 
(e.g., "v1.0.0") or anything. There exist some [80+ Jira issues with 
fixVersions prefixed with 
"JS-"|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20fixVersion%20IN(%20JS-0.3.0%2C%20JS-0.3.1%2C%20JS-0.4.0%2C%20JS-0.4.1)].
 To retain this metadata, I presume we will want to create corresponding 
milestones. Everything else appears to have reasonable/expected numeric values 
for version name.
h2. 
Milestone status

GitHub supports "open" or "closed" milestone statuses. ASF Jira versions have 
discrete "archived" and "released" boolean fields. I propose mapping anything 
that is either archived or released to a "closed" milestone status. The rest 
will be set "open". Currently, this maps the following as 
[open|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=open]:


 * 9.0.1
 * 10.0.1
 * 10.0.2
 * 11.0.0
 * 12.0.0

 
h2. Milestone date metadata

GitHub milestones support a "due on" date. This is the only date field that can 
be set during [creation or update 
operations|https://docs.github.com/en/rest/issues/milestones]. GitHub also 
tracks dates that milestones were created, last updated, and closed - but those 
cannot be explicitly set. This is a little annoying, because these seem to be 
what get displayed in the web interface for closed milestones:

!image-2022-11-22-11-53-20-840.png!


[jira] [Commented] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-22 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637424#comment-17637424
 ] 

Todd Farmer commented on ARROW-18364:
-

While possible to map existing ASF Jira Apache Arrow project versions to GitHub 
milestones, there are some decisions needed to be made. The existing ASF Jira 
Apache Arrow versions can be seen via API 
[here|https://issues.apache.org/jira/rest/api/2/project/ARROW/version], and 
I've done an initial import to a personal GitHub repo, which can be seen via 
API 
[here|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=all].
 Some notes:
h2. Milestone name

This is set to the version pulled from ASF Jira. It is not prefixed with "v" 
(e.g., "v1.0.0") or anything. There exist some [80+ Jira issues with 
fixVersions prefixed with 
"JS-"|https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20fixVersion%20IN(%20JS-0.3.0%2C%20JS-0.3.1%2C%20JS-0.4.0%2C%20JS-0.4.1)].
 To retain this metadata, I presume we will want to create corresponding 
milestones. Everything else appears to have reasonable/expected numeric values 
for version name.
h2. 
Milestone status

GitHub supports "open" or "closed" milestone statuses. ASF Jira versions have 
discrete "archived" and "released" boolean fields. I propose mapping anything 
that is either archived or released to a "closed" milestone status. The rest 
will be set "open". Currently, this maps the following as 
[open|https://api.github.com/repos/toddfarmer/test-arrow-config/milestones?state=open]:


 * 9.0.1
 * 10.0.1
 * 10.0.2
 * 11.0.0
 * 12.0.0

 
h2. Milestone date metadata

GitHub milestones support a "due on" date. This is the only date field that can 
be set during [creation or update 
operations|https://docs.github.com/en/rest/issues/milestones]. GitHub also 
tracks dates that milestones were created, last updated, and closed - but those 
cannot be explicitly set. This is a little annoying, because these seem to be 
what get displayed in the web interface for closed milestones:

!image-2022-11-22-11-53-20-840.png!

Open milestones reference the "due on" date, which for open milestones, is not 
yet defined:

!image-2022-11-22-11-55-17-106.png!

While I am using the Jira version releaseDate field to map to the "due on" 
field in the corresponding GitHub project milestone, it's not clear that this 
is ever displayed or useful in the context of web views.

 
h2. Multiple fix versions

Some 314 Apache Arrow Jira issues are associated with multiple fix versions. 
GitHub does not allow associating an issue with multiple milestones; adding a 
second milestone replaces the existing milestone association. This means that 
314 legacy issues will lose metadata associating with at least one version that 
it was associated with in Jira. Is this acceptable? Should the lowest or 
highest associated Jira version be used during import, if so?

> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-22-11-53-20-840.png, 
> image-2022-11-22-11-55-17-106.png
>
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.
> It's also worth noting now that GitHub issue reporting is enabled that issues 
> cannot be resolved in a way that explicitly tracks the version where the 
> resolution was made, if the issue is tracked only in GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-22 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18364:

Attachment: image-2022-11-22-11-55-17-106.png

> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-22-11-53-20-840.png, 
> image-2022-11-22-11-55-17-106.png
>
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.
> It's also worth noting now that GitHub issue reporting is enabled that issues 
> cannot be resolved in a way that explicitly tracks the version where the 
> resolution was made, if the issue is tracked only in GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-22 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18364:

Attachment: image-2022-11-22-11-53-20-840.png

> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-22-11-53-20-840.png
>
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.
> It's also worth noting now that GitHub issue reporting is enabled that issues 
> cannot be resolved in a way that explicitly tracks the version where the 
> resolution was made, if the issue is tracked only in GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-22 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18364:

Description: 
The [GitHub issue creation page for 
Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to open 
bug reports in Jira. Now that ASF Infra has disabled self-service registration 
in Jira, and in light of the pending migration of Apache Arrow issue tracking 
from ASF Jira to GitHub issues, we should enable bug reports to be submitted 
via GitHub directly. Issue templates will help distinguish bug reports and 
feature requests from existing usage assistance questions.

It's also worth noting now that GitHub issue reporting is enabled that issues 
cannot be resolved in a way that explicitly tracks the version where the 
resolution was made, if the issue is tracked only in GitHub issues.

  was:The [GitHub issue creation page for 
Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to open 
bug reports in Jira. Now that ASF Infra has disabled self-service registration 
in Jira, and in light of the pending migration of Apache Arrow issue tracking 
from ASF Jira to GitHub issues, we should enable bug reports to be submitted 
via GitHub directly. Issue templates will help distinguish bug reports and 
feature requests from existing usage assistance questions.


> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.
> It's also worth noting now that GitHub issue reporting is enabled that issues 
> cannot be resolved in a way that explicitly tracks the version where the 
> resolution was made, if the issue is tracked only in GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18381) MIGRATION: Create milestones for every needed fix version

2022-11-22 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18381:
---

 Summary: MIGRATION: Create milestones for every needed fix version
 Key: ARROW-18381
 URL: https://issues.apache.org/jira/browse/ARROW-18381
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


The Apache Arrow projects uses the "Fix version" field in ASF Jira issue to 
track the version in which issues were resolved/fixed/implemented. The most 
equivalent field in GitHub issues is the "milestone" field. This field is 
explicitly managed - the versions need to be added to the repository 
configuration before they can be used. This mapping needs to be established as 
a prerequisite for completing the import from ASF Jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18380) MIGRATION: Enable bot handling of GitHub issue linked PRs

2022-11-22 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637298#comment-17637298
 ] 

Todd Farmer commented on ARROW-18380:
-

[Here|https://github.com/apache/arrow/pull/14688#issuecomment-1322215016] is an 
example of GitHub bot comments that should be evaluated.

> MIGRATION: Enable bot handling of GitHub issue linked PRs
> -
>
> Key: ARROW-18380
> URL: https://issues.apache.org/jira/browse/ARROW-18380
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
>
> GitHub workflows for the Apache Arrow project assume that PRs reference ASF 
> Jira issues (or are minor changes). This needs to be revisited now that 
> GitHub issue reporting is enabled, as there may well be no ASF Jira issue to 
> link a PR against going forward. The resulting bot comments can be confusing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18380) MIGRATION: Enable bot handling of GitHub issue linked PRs

2022-11-22 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18380:
---

 Summary: MIGRATION: Enable bot handling of GitHub issue linked PRs
 Key: ARROW-18380
 URL: https://issues.apache.org/jira/browse/ARROW-18380
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


GitHub workflows for the Apache Arrow project assume that PRs reference ASF 
Jira issues (or are minor changes). This needs to be revisited now that GitHub 
issue reporting is enabled, as there may well be no ASF Jira issue to link a PR 
against going forward. The resulting bot comments can be confusing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18378) MIGRATION: Disable issue reporting in ASF Jira

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18378:
---

 Summary: MIGRATION: Disable issue reporting in ASF Jira
 Key: ARROW-18378
 URL: https://issues.apache.org/jira/browse/ARROW-18378
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


ARROW-18364 enabled issue reporting for Apache Arrow in GitHub issues. Even 
though existing Jira issues have not yet been migrated and are still being 
worked in the Jira system, we should assess disabling creation of new issues in 
ASF Jira, and instead pointing users to GitHub issues. This may benefit the 
project by reducing the need to monitor inflow in two discrete systems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18377) MIGRATION: Automate component labels from issue form content

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18377:
---

 Summary: MIGRATION: Automate component labels from issue form 
content
 Key: ARROW-18377
 URL: https://issues.apache.org/jira/browse/ARROW-18377
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


ARROW-18364 added the ability to report issues in GitHub, and includes GitHub 
issue templates with a drop-down component(s) selector. These form elements 
drive resulting issue markdown only, and cannot dynamically drive issue labels. 
This requires GitHub actions, which also have a few limitations. First, the 
issue form does not produce any structured data, it only produces the issue 
description markdown, so a parser is required. Second, ASF restricts GitHub 
actions to a selection of approved actions. It is likely that while community 
actions exist to generate structured data from issue forms, the Apache Arrow 
project will need to write its own parser and label application action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18376) MIGRATION: Add component labels to GitHub

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18376:
---

 Summary: MIGRATION: Add component labels to GitHub
 Key: ARROW-18376
 URL: https://issues.apache.org/jira/browse/ARROW-18376
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


Similar to ARROW-18375, component labels have been established based on 
existing component values defined in ASF Jira. The following labels are needed:

* Component: Archery
* Component: Benchmarking
* Component: C
* Component: C#
* Component: C++
* Component: C++ - Gandiva
* Component: C++ - Plasma
* Component: Continuous Integration
* Component: Dart
* Component: Developer Tools
* Component: Documentation
* Component: FlightRPC
* Component: Format
* Component: GLib
* Component: Go
* Component: GPU
* Component: Integration
* Component: Java
* Component: JavaScript
* Component: MATLAB
* Component: Packaging
* Component: Parquet
* Component: Python
* Component: R
* Component: Ruby
* Component: Swift
* Component: Website
* Component: Other



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18375) MIGRATION: Enable GitHub issue type labels

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18375:
---

 Summary: MIGRATION: Enable GitHub issue type labels
 Key: ARROW-18375
 URL: https://issues.apache.org/jira/browse/ARROW-18375
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


As part of enabling GitHub issue reporting, the following labels have been 
defined and need to be added to the repository label options. Without these 
labels added, [new issues|https://github.com/apache/arrow/issues/14692] do not 
get the issue template-defined issue type labels set properly.

 

Labels:
 * Type: bug
 * Type: enhancement
 * Type: usage

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-18373) MIGRATION: Enable multiple component selection in issue templates

2022-11-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-18373:
---

Assignee: Todd Farmer

> MIGRATION: Enable multiple component selection in issue templates
> -
>
> Key: ARROW-18373
> URL: https://issues.apache.org/jira/browse/ARROW-18373
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>
> Per comments in [this merged PR|https://github.com/apache/arrow/pull/14675], 
> we would like to enable selection of multiple components when reporting 
> issues via GitHub issues.
> Additionally, we may want to add the needed Apache license to the issue 
> templates and remove the exclusion rules from rat_exclude_files.txt.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18373) MIGRATION: Enable multiple component selection in issue templates

2022-11-21 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18373:
---

 Summary: MIGRATION: Enable multiple component selection in issue 
templates
 Key: ARROW-18373
 URL: https://issues.apache.org/jira/browse/ARROW-18373
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


Per comments in [this merged PR|https://github.com/apache/arrow/pull/14675], we 
would like to enable selection of multiple components when reporting issues via 
GitHub issues.

Additionally, we may want to add the needed Apache license to the issue 
templates and remove the exclusion rules from rat_exclude_files.txt.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-18323) MIGRATION TEST ISSUE #2

2022-11-18 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-18323:
---

Assignee: Todd Farmer

> MIGRATION TEST ISSUE #2
> ---
>
> Key: ARROW-18323
> URL: https://issues.apache.org/jira/browse/ARROW-18323
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>
> This issue was created to help test migration-related process and tooling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-18 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-18364:
---

Assignee: Todd Farmer

> MIGRATION: Update GitHub issue templates to support bug reports and feature 
> requests
> 
>
> Key: ARROW-18364
> URL: https://issues.apache.org/jira/browse/ARROW-18364
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Assignee: Todd Farmer
>Priority: Major
>
> The [GitHub issue creation page for 
> Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to 
> open bug reports in Jira. Now that ASF Infra has disabled self-service 
> registration in Jira, and in light of the pending migration of Apache Arrow 
> issue tracking from ASF Jira to GitHub issues, we should enable bug reports 
> to be submitted via GitHub directly. Issue templates will help distinguish 
> bug reports and feature requests from existing usage assistance questions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18364) MIGRATION: Update GitHub issue templates to support bug reports and feature requests

2022-11-18 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18364:
---

 Summary: MIGRATION: Update GitHub issue templates to support bug 
reports and feature requests
 Key: ARROW-18364
 URL: https://issues.apache.org/jira/browse/ARROW-18364
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


The [GitHub issue creation page for 
Arrow|https://github.com/apache/arrow/issues/new/choose] directs users to open 
bug reports in Jira. Now that ASF Infra has disabled self-service registration 
in Jira, and in light of the pending migration of Apache Arrow issue tracking 
from ASF Jira to GitHub issues, we should enable bug reports to be submitted 
via GitHub directly. Issue templates will help distinguish bug reports and 
feature requests from existing usage assistance questions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18324) MIGRATION TEST ISSUE #3

2022-11-14 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17634026#comment-17634026
 ] 

Todd Farmer commented on ARROW-18324:
-

Test comment with screenshot attached:

 !screenshot-1.png! 

> MIGRATION TEST ISSUE #3
> ---
>
> Key: ARROW-18324
> URL: https://issues.apache.org/jira/browse/ARROW-18324
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-14-12-50-10-611.png, screenshot-1.png
>
>
> This issue was created to evaluate processes and tools for migrating issue 
> content to GitHub, when the issue contains attachments and inline images.
>  !image-2022-11-14-12-50-10-611.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18324) MIGRATION TEST ISSUE #3

2022-11-14 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18324:

Attachment: screenshot-1.png

> MIGRATION TEST ISSUE #3
> ---
>
> Key: ARROW-18324
> URL: https://issues.apache.org/jira/browse/ARROW-18324
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Todd Farmer
>Priority: Major
> Attachments: image-2022-11-14-12-50-10-611.png, screenshot-1.png
>
>
> This issue was created to evaluate processes and tools for migrating issue 
> content to GitHub, when the issue contains attachments and inline images.
>  !image-2022-11-14-12-50-10-611.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18324) MIGRATION TEST ISSUE #3

2022-11-14 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18324:
---

 Summary: MIGRATION TEST ISSUE #3
 Key: ARROW-18324
 URL: https://issues.apache.org/jira/browse/ARROW-18324
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer
 Attachments: image-2022-11-14-12-50-10-611.png, screenshot-1.png

This issue was created to evaluate processes and tools for migrating issue 
content to GitHub, when the issue contains attachments and inline images.

 !image-2022-11-14-12-50-10-611.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18323) MIGRATION TEST ISSUE #2

2022-11-14 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18323:
---

 Summary: MIGRATION TEST ISSUE #2
 Key: ARROW-18323
 URL: https://issues.apache.org/jira/browse/ARROW-18323
 Project: Apache Arrow
  Issue Type: Task
Reporter: Todd Farmer


This issue was created to help test migration-related process and tooling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18308:

External issue URL: https://github.com/toddfarmer/test_import/issues/99  
(was: https://github.com/toddfarmer/test_import/issues/98)

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ https://issues.apache.org/jira/browse/ARROW-18308 ]


Todd Farmer deleted comment on ARROW-18308:
-

was (Author: JIRAUSER288796):
This issue has been migrated as [issue 
#97|https://github.com/toddfarmer/test_import/issues/97] in GitHub. Please see 
the [migration 
documentation|https://gist.github.com/toddfarmer/12aa88361532d21902818a6044fda4c3]
 for further details.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ https://issues.apache.org/jira/browse/ARROW-18308 ]


Todd Farmer deleted comment on ARROW-18308:
-

was (Author: JIRAUSER288796):
This issue has been migrated as [issue 
#98|https://github.com/toddfarmer/test_import/issues/98] in GitHub. Please see 
the [migration 
documentation|https://gist.github.com/toddfarmer/12aa88361532d21902818a6044fda4c3]
 for further details.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632004#comment-17632004
 ] 

Todd Farmer commented on ARROW-18308:
-

This issue has been migrated as [issue 
#99|https://github.com/toddfarmer/test_import/issues/99] in GitHub. Please see 
the [migration 
documentation|https://gist.github.com/toddfarmer/12aa88361532d21902818a6044fda4c3]
 for further details.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18308:

External issue URL: https://github.com/toddfarmer/test_import/issues/98  
(was: https://github.com/toddfarmer/test_import/issues/97)

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632003#comment-17632003
 ] 

Todd Farmer commented on ARROW-18308:
-

This issue has been migrated as [issue 
#98|https://github.com/toddfarmer/test_import/issues/98] in GitHub. Please see 
the [migration 
documentation|https://gist.github.com/toddfarmer/12aa88361532d21902818a6044fda4c3]
 for further details.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ https://issues.apache.org/jira/browse/ARROW-18308 ]


Todd Farmer deleted comment on ARROW-18308:
-

was (Author: JIRAUSER288796):
This issue has been migrated as [issue 
#96|https://github.com/toddfarmer/test_import/issues/96] in GitHub. Please see 
the [migration 
documentation|https://gist.github.com/toddfarmer/12aa88361532d21902818a6044fda4c3]
 for further details.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18308:

External issue URL: https://github.com/toddfarmer/test_import/issues/97  
(was: https://github.com/toddfarmer/test_import/issues/96)

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632002#comment-17632002
 ] 

Todd Farmer commented on ARROW-18308:
-

This issue has been migrated as [issue 
#97|https://github.com/toddfarmer/test_import/issues/97] in GitHub. Please see 
the [migration 
documentation|https://gist.github.com/toddfarmer/12aa88361532d21902818a6044fda4c3]
 for further details.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ https://issues.apache.org/jira/browse/ARROW-18308 ]


Todd Farmer deleted comment on ARROW-18308:
-

was (Author: JIRAUSER288796):
This issue has been migrated as [issue 
#95|https://github.com/toddfarmer/test_import/issues/95] in GitHub.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18308:

External issue URL: https://github.com/toddfarmer/test_import/issues/96  
(was: https://github.com/toddfarmer/test_import/issues/95)

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631999#comment-17631999
 ] 

Todd Farmer commented on ARROW-18308:
-

This issue has been migrated as [issue 
#96|https://github.com/toddfarmer/test_import/issues/96] in GitHub. Please see 
the [migration 
documentation|https://gist.github.com/toddfarmer/12aa88361532d21902818a6044fda4c3]
 for further details.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ https://issues.apache.org/jira/browse/ARROW-18308 ]


Todd Farmer deleted comment on ARROW-18308:
-

was (Author: JIRAUSER288796):
This issue has been migrated as [issue 
#94|https://github.com/toddfarmer/test_import/issues/94] in GitHub.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631998#comment-17631998
 ] 

Todd Farmer commented on ARROW-18308:
-

This issue has been migrated as [issue 
#95|https://github.com/toddfarmer/test_import/issues/95] in GitHub.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18308:

External issue URL: https://github.com/toddfarmer/test_import/issues/95  
(was: http://www.test.com)

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631997#comment-17631997
 ] 

Todd Farmer commented on ARROW-18308:
-

This issue has been migrated as [issue 
#94|https://github.com/toddfarmer/test_import/issues/94] in GitHub.

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-18308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer updated ARROW-18308:

External issue URL: http://www.test.com

> MIGRATION TEST ISSUE
> 
>
> Key: ARROW-18308
> URL: https://issues.apache.org/jira/browse/ARROW-18308
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Todd Farmer
>Priority: Major
>
> This issue will be used to validate certain elements of the process and 
> tooling to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18308:
---

 Summary: MIGRATION TEST ISSUE
 Key: ARROW-18308
 URL: https://issues.apache.org/jira/browse/ARROW-18308
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Todd Farmer


This issue will be used to validate certain elements of the process and tooling 
to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16319) [R] [Docs] Document the lubridate functions we support in {arrow}

2022-09-28 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16319:
---

Assignee: (was: Stephanie Hazlitt)

> [R] [Docs] Document the lubridate functions we support in {arrow}
> -
>
> Key: ARROW-16319
> URL: https://issues.apache.org/jira/browse/ARROW-16319
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Affects Versions: 8.0.0
>Reporter: Dragoș Moldovan-Grünfeld
>Priority: Major
>
> Add documentation around the {{lubridate}} functionality supported in 
> {{arrow}}. Could be made up of:
> * a blogpost 
> * a more in-depth piece of documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-14209) [R] Allow multiple arguments to n_distinct()

2022-09-28 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-14209:
---

Assignee: (was: Dragoș Moldovan-Grünfeld)

> [R] Allow multiple arguments to n_distinct()
> 
>
> Key: ARROW-14209
> URL: https://issues.apache.org/jira/browse/ARROW-14209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Priority: Major
>
> ARROW-13620 and ARROW-14036 added support for the {{n_distinct()}} function 
> in the dplyr verb {{summarise()}} but only with a single argument. Add 
> support for multiple arguments to {{n_distinct()}}. This should return the 
> number of unique combinations of values in the specified columns/expressions.
> See the comment about this here: 
> [https://github.com/apache/arrow/pull/11257#discussion_r720873549]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-12311) [Python][R] Expose (hide?) ScanOptions

2022-09-28 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610656#comment-17610656
 ] 

Todd Farmer commented on ARROW-12311:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [Python][R] Expose (hide?) ScanOptions
> --
>
> Key: ARROW-12311
> URL: https://issues.apache.org/jira/browse/ARROW-12311
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python, R
>Reporter: Weston Pace
>Assignee: Weston Pace
>Priority: Major
> Fix For: 10.0.0
>
>
> Currently R completely hides the `ScanOptions` class.
> In python the class is exposed but the documentation prefers `dataset.scan` 
> (which hides both the scanner and the scan options).
> However, there is some useful information in the `ScanOptions`.  
> Specifically, the projected schema (which is a product of the dataset schema 
> and the projection expression and not easily recreated) and the materialized 
> fields (the list of fields referenced by either the filter or the projection) 
> which might be useful for reporting purposes.
> Currently R uses the projected schema to convert a list of column names into 
> a partition schema.  Python does not rely on either field.
>  
> Options:
>  - Keep the status quo
>  - Expose the ScanOptions object (which itself is exposed via the Scanner)
>  - Expose the interesting fields via the Scanner
>  
> Currently the C++ design is halfway between the latter two (projected schema 
> is exposed and options).  My preference would be the third option.  It raises 
> a further question about how to expose the scanner itself in Python?  Should 
> the user be using ScannerBuilder?  Should they use NewScan?  Should they use 
> the scanner directly at all or should it be hidden?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-14138) [R] update metadata when casting a record batch column

2022-09-28 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610654#comment-17610654
 ] 

Todd Farmer commented on ARROW-14138:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [R] update metadata when casting a record batch column
> --
>
> Key: ARROW-14138
> URL: https://issues.apache.org/jira/browse/ARROW-14138
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain Francois
>Assignee: Romain Francois
>Priority: Minor
> Fix For: 10.0.0
>
>
> library(arrow, warn.conflicts = FALSE)
> #> See arrow_info() for available features
> raws <- structure(list(
>   as.raw(c(0x70, 0x65, 0x72, 0x73, 0x6f, 0x6e))
> ), class = c("arrow_binary", "vctrs_vctr", "list"))
> batch <- record_batch(b = raws)
> batch$metadata$r
> #>  'arrow_r_metadata' chr 
> "A\n3\n262147\n197888\n5\nUTF-8\n531\n1\n531\n1\n531\n2\n531\n1\n16\n3\n262153\n12\narrow_binary\n262153\n10\nvc"|
>  __truncated__
> #> List of 1
> #>  $ columns:List of 1
> #>   ..$ b:List of 2
> #>   .. ..$ attributes:List of 1
> #>   .. .. ..$ class: chr [1:3] "arrow_binary" "vctrs_vctr" "list"
> #>   .. ..$ columns   : NULL
> # when casting `b` to a string column, the metadata is kept
> batch$b <- batch$b$cast(utf8())
> batch$metadata$r
> #>  'arrow_r_metadata' chr 
> "A\n3\n262147\n197888\n5\nUTF-8\n531\n1\n531\n1\n531\n2\n531\n1\n16\n3\n262153\n12\narrow_binary\n262153\n10\nvc"|
>  __truncated__
> #> List of 1
> #>  $ columns:List of 1
> #>   ..$ b:List of 2
> #>   .. ..$ attributes:List of 1
> #>   .. .. ..$ class: chr [1:3] "arrow_binary" "vctrs_vctr" "list"
> #>   .. ..$ columns   : NULL
> # but it should not have
> batch2 <- record_batch(b = "string")
> batch2$metadata$r
> #> NULL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-14987) [C++]Memory leak while reading parquet file

2022-09-28 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610651#comment-17610651
 ] 

Todd Farmer commented on ARROW-14987:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++]Memory leak while reading parquet file
> ---
>
> Key: ARROW-14987
> URL: https://issues.apache.org/jira/browse/ARROW-14987
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 6.0.1
>Reporter: Qingxiang Chen
>Assignee: Weston Pace
>Priority: Major
>
> When I used parquet to access data, I found that the memory usage was still 
> high after the function ended. I reproduced this problem in the example. code 
> show as below:
>  
> {code:c++}
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> std::shared_ptr generate_table() {
>   arrow::Int64Builder i64builder;
>   for (int i=0;i<32;i++){
> i64builder.Append(i);
>   }
>   std::shared_ptr i64array;
>   PARQUET_THROW_NOT_OK(i64builder.Finish());
>   std::shared_ptr schema = arrow::schema(
>   {arrow::field("int", arrow::int64())});
>   return arrow::Table::Make(schema, {i64array});
> }
> void write_parquet_file(const arrow::Table& table) {
>   std::shared_ptr outfile;
>   PARQUET_ASSIGN_OR_THROW(
>   outfile, 
> arrow::io::FileOutputStream::Open("parquet-arrow-example.parquet"));
>   PARQUET_THROW_NOT_OK(
>   parquet::arrow::WriteTable(table, arrow::default_memory_pool(), 
> outfile, 3));
> }
> void read_whole_file() {
>   std::cout << "Reading parquet-arrow-example.parquet at once" << std::endl;
>   std::shared_ptr infile;
>   PARQUET_ASSIGN_OR_THROW(infile,
>   
> arrow::io::ReadableFile::Open("parquet-arrow-example.parquet",
> 
> arrow::default_memory_pool()));
>   std::unique_ptr reader;
>   PARQUET_THROW_NOT_OK(
>   parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), 
> ));
>   std::shared_ptr table;
>   PARQUET_THROW_NOT_OK(reader->ReadTable());
>   std::cout << "Loaded " << table->num_rows() << " rows in " << 
> table->num_columns()
> << " columns." << std::endl;
> }
> int main(int argc, char** argv) {
>   std::shared_ptr table = generate_table();
>   write_parquet_file(*table);
>   std::cout << "start " <   read_whole_file();
>   std::cout << "end " <   sleep(100);
> }
> {code}
> After the end, during sleep, the memory usage is still more than 100M and has 
> not dropped. When I increase the data volume by 5 times, the memory usage is 
> about 500M, and it will not drop.
> I want to know whether this part of the data is cached by the memory pool, or 
> whether it is a memory leak problem. If there is no memory leak, how to set 
> memory pool size or release memory?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16155) [R] lubridate functions for 9.0.0

2022-09-28 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610655#comment-17610655
 ] 

Todd Farmer commented on ARROW-16155:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [R] lubridate functions for 9.0.0
> -
>
> Key: ARROW-16155
> URL: https://issues.apache.org/jira/browse/ARROW-16155
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 8.0.0
>Reporter: Alessandro Molina
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>
> Umbrella ticket for lubridate functions in 9.0.0
> Future work that is not going to happen in v9 is recorder under 
> https://issues.apache.org/jira/browse/ARROW-16841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-14138) [R] update metadata when casting a record batch column

2022-09-28 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-14138:
---

Assignee: (was: Romain Francois)

> [R] update metadata when casting a record batch column
> --
>
> Key: ARROW-14138
> URL: https://issues.apache.org/jira/browse/ARROW-14138
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain Francois
>Priority: Minor
> Fix For: 10.0.0
>
>
> library(arrow, warn.conflicts = FALSE)
> #> See arrow_info() for available features
> raws <- structure(list(
>   as.raw(c(0x70, 0x65, 0x72, 0x73, 0x6f, 0x6e))
> ), class = c("arrow_binary", "vctrs_vctr", "list"))
> batch <- record_batch(b = raws)
> batch$metadata$r
> #>  'arrow_r_metadata' chr 
> "A\n3\n262147\n197888\n5\nUTF-8\n531\n1\n531\n1\n531\n2\n531\n1\n16\n3\n262153\n12\narrow_binary\n262153\n10\nvc"|
>  __truncated__
> #> List of 1
> #>  $ columns:List of 1
> #>   ..$ b:List of 2
> #>   .. ..$ attributes:List of 1
> #>   .. .. ..$ class: chr [1:3] "arrow_binary" "vctrs_vctr" "list"
> #>   .. ..$ columns   : NULL
> # when casting `b` to a string column, the metadata is kept
> batch$b <- batch$b$cast(utf8())
> batch$metadata$r
> #>  'arrow_r_metadata' chr 
> "A\n3\n262147\n197888\n5\nUTF-8\n531\n1\n531\n1\n531\n2\n531\n1\n16\n3\n262153\n12\narrow_binary\n262153\n10\nvc"|
>  __truncated__
> #> List of 1
> #>  $ columns:List of 1
> #>   ..$ b:List of 2
> #>   .. ..$ attributes:List of 1
> #>   .. .. ..$ class: chr [1:3] "arrow_binary" "vctrs_vctr" "list"
> #>   .. ..$ columns   : NULL
> # but it should not have
> batch2 <- record_batch(b = "string")
> batch2$metadata$r
> #> NULL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-14987) [C++]Memory leak while reading parquet file

2022-09-28 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-14987:
---

Assignee: (was: Weston Pace)

> [C++]Memory leak while reading parquet file
> ---
>
> Key: ARROW-14987
> URL: https://issues.apache.org/jira/browse/ARROW-14987
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 6.0.1
>Reporter: Qingxiang Chen
>Priority: Major
>
> When I used parquet to access data, I found that the memory usage was still 
> high after the function ended. I reproduced this problem in the example. code 
> show as below:
>  
> {code:c++}
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> std::shared_ptr generate_table() {
>   arrow::Int64Builder i64builder;
>   for (int i=0;i<32;i++){
> i64builder.Append(i);
>   }
>   std::shared_ptr i64array;
>   PARQUET_THROW_NOT_OK(i64builder.Finish());
>   std::shared_ptr schema = arrow::schema(
>   {arrow::field("int", arrow::int64())});
>   return arrow::Table::Make(schema, {i64array});
> }
> void write_parquet_file(const arrow::Table& table) {
>   std::shared_ptr outfile;
>   PARQUET_ASSIGN_OR_THROW(
>   outfile, 
> arrow::io::FileOutputStream::Open("parquet-arrow-example.parquet"));
>   PARQUET_THROW_NOT_OK(
>   parquet::arrow::WriteTable(table, arrow::default_memory_pool(), 
> outfile, 3));
> }
> void read_whole_file() {
>   std::cout << "Reading parquet-arrow-example.parquet at once" << std::endl;
>   std::shared_ptr infile;
>   PARQUET_ASSIGN_OR_THROW(infile,
>   
> arrow::io::ReadableFile::Open("parquet-arrow-example.parquet",
> 
> arrow::default_memory_pool()));
>   std::unique_ptr reader;
>   PARQUET_THROW_NOT_OK(
>   parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), 
> ));
>   std::shared_ptr table;
>   PARQUET_THROW_NOT_OK(reader->ReadTable());
>   std::cout << "Loaded " << table->num_rows() << " rows in " << 
> table->num_columns()
> << " columns." << std::endl;
> }
> int main(int argc, char** argv) {
>   std::shared_ptr table = generate_table();
>   write_parquet_file(*table);
>   std::cout << "start " <   read_whole_file();
>   std::cout << "end " <   sleep(100);
> }
> {code}
> After the end, during sleep, the memory usage is still more than 100M and has 
> not dropped. When I increase the data volume by 5 times, the memory usage is 
> about 500M, and it will not drop.
> I want to know whether this part of the data is cached by the memory pool, or 
> whether it is a memory leak problem. If there is no memory leak, how to set 
> memory pool size or release memory?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16155) [R] lubridate functions for 9.0.0

2022-09-28 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16155:
---

Assignee: (was: Dragoș Moldovan-Grünfeld)

> [R] lubridate functions for 9.0.0
> -
>
> Key: ARROW-16155
> URL: https://issues.apache.org/jira/browse/ARROW-16155
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 8.0.0
>Reporter: Alessandro Molina
>Priority: Major
>
> Umbrella ticket for lubridate functions in 9.0.0
> Future work that is not going to happen in v9 is recorder under 
> https://issues.apache.org/jira/browse/ARROW-16841



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16319) [R] [Docs] Document the lubridate functions we support in {arrow}

2022-09-28 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610653#comment-17610653
 ] 

Todd Farmer commented on ARROW-16319:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [R] [Docs] Document the lubridate functions we support in {arrow}
> -
>
> Key: ARROW-16319
> URL: https://issues.apache.org/jira/browse/ARROW-16319
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Affects Versions: 8.0.0
>Reporter: Dragoș Moldovan-Grünfeld
>Assignee: Stephanie Hazlitt
>Priority: Major
>
> Add documentation around the {{lubridate}} functionality supported in 
> {{arrow}}. Could be made up of:
> * a blogpost 
> * a more in-depth piece of documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-14588) [R] Create an arrow-specific checklist for a CRAN release

2022-09-28 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-14588:
---

Assignee: (was: Dragoș Moldovan-Grünfeld)

> [R] Create an arrow-specific checklist for a CRAN release  
> ---
>
> Key: ARROW-14588
> URL: https://issues.apache.org/jira/browse/ARROW-14588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Priority: Minor
>
> This would adapt and implement the functionality of 
> {{usethis::use_release_issue()}} for {{arrow}}'s specific context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-12311) [Python][R] Expose (hide?) ScanOptions

2022-09-28 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-12311:
---

Assignee: (was: Weston Pace)

> [Python][R] Expose (hide?) ScanOptions
> --
>
> Key: ARROW-12311
> URL: https://issues.apache.org/jira/browse/ARROW-12311
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python, R
>Reporter: Weston Pace
>Priority: Major
> Fix For: 10.0.0
>
>
> Currently R completely hides the `ScanOptions` class.
> In python the class is exposed but the documentation prefers `dataset.scan` 
> (which hides both the scanner and the scan options).
> However, there is some useful information in the `ScanOptions`.  
> Specifically, the projected schema (which is a product of the dataset schema 
> and the projection expression and not easily recreated) and the materialized 
> fields (the list of fields referenced by either the filter or the projection) 
> which might be useful for reporting purposes.
> Currently R uses the projected schema to convert a list of column names into 
> a partition schema.  Python does not rely on either field.
>  
> Options:
>  - Keep the status quo
>  - Expose the ScanOptions object (which itself is exposed via the Scanner)
>  - Expose the interesting fields via the Scanner
>  
> Currently the C++ design is halfway between the latter two (projected schema 
> is exposed and options).  My preference would be the third option.  It raises 
> a further question about how to expose the scanner itself in Python?  Should 
> the user be using ScannerBuilder?  Should they use NewScan?  Should they use 
> the scanner directly at all or should it be hidden?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-14209) [R] Allow multiple arguments to n_distinct()

2022-09-28 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610650#comment-17610650
 ] 

Todd Farmer commented on ARROW-14209:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [R] Allow multiple arguments to n_distinct()
> 
>
> Key: ARROW-14209
> URL: https://issues.apache.org/jira/browse/ARROW-14209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>
> ARROW-13620 and ARROW-14036 added support for the {{n_distinct()}} function 
> in the dplyr verb {{summarise()}} but only with a single argument. Add 
> support for multiple arguments to {{n_distinct()}}. This should return the 
> number of unique combinations of values in the specified columns/expressions.
> See the comment about this here: 
> [https://github.com/apache/arrow/pull/11257#discussion_r720873549]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-14588) [R] Create an arrow-specific checklist for a CRAN release

2022-09-28 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610652#comment-17610652
 ] 

Todd Farmer commented on ARROW-14588:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [R] Create an arrow-specific checklist for a CRAN release  
> ---
>
> Key: ARROW-14588
> URL: https://issues.apache.org/jira/browse/ARROW-14588
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Dragoș Moldovan-Grünfeld
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Minor
>
> This would adapt and implement the functionality of 
> {{usethis::use_release_issue()}} for {{arrow}}'s specific context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-13028) [C++] CSV add convert option to attempt 32bit number inferences

2022-09-27 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-13028:
---

Assignee: (was: Nate Clark)

> [C++] CSV add convert option to attempt 32bit number inferences
> ---
>
> Key: ARROW-13028
> URL: https://issues.apache.org/jira/browse/ARROW-13028
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Nate Clark
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When types are being inferred by CSV the numbers are always 64 bit. For large 
> data sets it could be better to use 32 bit types to save over all memory. To 
> do this it would be useful to add an option to ConvertOptions to try 32 bit 
> numbers before 64 bit. By default this option would be disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-14289) [C++] Change Scanner::Head to return a RecordBatchReader

2022-09-27 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610150#comment-17610150
 ] 

Todd Farmer commented on ARROW-14289:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Change Scanner::Head to return a RecordBatchReader
> 
>
> Key: ARROW-14289
> URL: https://issues.apache.org/jira/browse/ARROW-14289
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, R
>Reporter: Neal Richardson
>Assignee: Weston Pace
>Priority: Major
>
> Following ARROW-9731 and ARROW-13893. This would make it more natural to work 
> with ExecPlans that return a RecordBatchReader when you Run them. 
> Alternatively, we could move the business to RecordBatchReader::Head.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-15459) [C++] Unable to build Arrow C++ on osx arm64 inside conda env because of Invalid configuration `arm64-apple-darwin20.0.0': machine `arm64-apple' not recognized and arro

2022-09-27 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-15459:
---

Assignee: (was: Elena Henderson)

> [C++] Unable to build Arrow C++ on osx arm64 inside conda env because of 
> Invalid configuration `arm64-apple-darwin20.0.0': machine `arm64-apple' not 
> recognized and arrow/cpp/arm64-apple-darwin20.0.0-ar: No such file or 
> directory
> 
>
> Key: ARROW-15459
> URL: https://issues.apache.org/jira/browse/ARROW-15459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Elena Henderson
>Priority: Major
>  Labels: osx-arm64
> Attachments: logs
>
>
> Steps to reproduce this issue on osx arm64:
> {code:bash}
> git clone https://github.com/apache/arrow.git
> cd arrow/cpp
> brew update && brew install node && brew bundle --file=Brewfile
> cd ..
> mamba create -y -n arrow-commit -c conda-forge \
>   --file ci/conda_env_unix.txt \
>   --file ci/conda_env_cpp.txt \
>   --file ci/conda_env_python.txt \
>   compilers \
>   python=3.8 \
>   pandas \
>   aws-sdk-cpp \
>   r
> mamba activate arrow-commit
> pip install -r python/requirements-build.txt -r python/requirements-test.txt
> export ARROW_BUILD_TESTS=OFF
> export ARROW_BUILD_TYPE=release
> export ARROW_DEPENDENCY_SOURCE=AUTO
> export ARROW_DATASET=ON
> export ARROW_DEFAULT_MEMORY_POOL=mimalloc
> export ARROW_ENABLE_UNSAFE_MEMORY_ACCESS=true
> export ARROW_ENABLE_NULL_CHECK_FOR_GET=false
> export ARROW_FLIGHT=OFF
> export ARROW_GANDIVA=OFF
> export ARROW_HDFS=ON
> export ARROW_HOME=$CONDA_PREFIX
> export ARROW_INSTALL_NAME_RPATH=OFF
> export ARROW_MIMALLOC=ON
> export ARROW_NO_DEPRECATED_API=ON
> export ARROW_ORC=ON
> export ARROW_PARQUET=ON
> export ARROW_PLASMA=ON
> export ARROW_PYTHON=ON
> export ARROW_S3=ON
> export ARROW_USE_ASAN=OFF
> export ARROW_USE_CCACHE=ON
> export ARROW_USE_UBSAN=OFF
> export ARROW_WITH_BROTLI=ON
> export ARROW_WITH_BZ2=ON
> export ARROW_WITH_LZ4=ON
> export ARROW_WITH_SNAPPY=ON
> export ARROW_WITH_ZLIB=ON
> export ARROW_WITH_ZSTD=ON
> export GTest_SOURCE=BUNDLED
> export ORC_SOURCE=BUNDLED
> export PARQUET_BUILD_EXAMPLES=ON
> export PARQUET_BUILD_EXECUTABLES=ON
> export PYTHON=python
> export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
> ci/scripts/cpp_build.sh $(pwd) $(pwd) 
> {code}
>  
> Error (full logs are attached):
> {code:java}
> ...
> checking size of void *... 8
> checking size of int... 4
> checking size of long... 8
> checking size of long long... 8
> checking size of intmax_t... 8
> checking build system type... 
> -- stderr output is:
> Invalid configuration `arm64-apple-darwin20.0.0': machine `arm64-apple' not 
> recognized
> configure: error: /bin/sh build-aux/config.sub arm64-apple-darwin20.0.0 failed
> CMake Error at 
> /Users/voltrondata/arrow/cpp/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-configure-RELEASE.cmake:47
>  (message):
>   Stopping after outputting logs.
> [31/380] Performing configure step for 'orc_ep'
> ninja: build stopped: subcommand failed. {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15459) [C++] Unable to build Arrow C++ on osx arm64 inside conda env because of Invalid configuration `arm64-apple-darwin20.0.0': machine `arm64-apple' not recognized and arr

2022-09-27 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610151#comment-17610151
 ] 

Todd Farmer commented on ARROW-15459:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Unable to build Arrow C++ on osx arm64 inside conda env because of 
> Invalid configuration `arm64-apple-darwin20.0.0': machine `arm64-apple' not 
> recognized and arrow/cpp/arm64-apple-darwin20.0.0-ar: No such file or 
> directory
> 
>
> Key: ARROW-15459
> URL: https://issues.apache.org/jira/browse/ARROW-15459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Elena Henderson
>Assignee: Elena Henderson
>Priority: Major
>  Labels: osx-arm64
> Attachments: logs
>
>
> Steps to reproduce this issue on osx arm64:
> {code:bash}
> git clone https://github.com/apache/arrow.git
> cd arrow/cpp
> brew update && brew install node && brew bundle --file=Brewfile
> cd ..
> mamba create -y -n arrow-commit -c conda-forge \
>   --file ci/conda_env_unix.txt \
>   --file ci/conda_env_cpp.txt \
>   --file ci/conda_env_python.txt \
>   compilers \
>   python=3.8 \
>   pandas \
>   aws-sdk-cpp \
>   r
> mamba activate arrow-commit
> pip install -r python/requirements-build.txt -r python/requirements-test.txt
> export ARROW_BUILD_TESTS=OFF
> export ARROW_BUILD_TYPE=release
> export ARROW_DEPENDENCY_SOURCE=AUTO
> export ARROW_DATASET=ON
> export ARROW_DEFAULT_MEMORY_POOL=mimalloc
> export ARROW_ENABLE_UNSAFE_MEMORY_ACCESS=true
> export ARROW_ENABLE_NULL_CHECK_FOR_GET=false
> export ARROW_FLIGHT=OFF
> export ARROW_GANDIVA=OFF
> export ARROW_HDFS=ON
> export ARROW_HOME=$CONDA_PREFIX
> export ARROW_INSTALL_NAME_RPATH=OFF
> export ARROW_MIMALLOC=ON
> export ARROW_NO_DEPRECATED_API=ON
> export ARROW_ORC=ON
> export ARROW_PARQUET=ON
> export ARROW_PLASMA=ON
> export ARROW_PYTHON=ON
> export ARROW_S3=ON
> export ARROW_USE_ASAN=OFF
> export ARROW_USE_CCACHE=ON
> export ARROW_USE_UBSAN=OFF
> export ARROW_WITH_BROTLI=ON
> export ARROW_WITH_BZ2=ON
> export ARROW_WITH_LZ4=ON
> export ARROW_WITH_SNAPPY=ON
> export ARROW_WITH_ZLIB=ON
> export ARROW_WITH_ZSTD=ON
> export GTest_SOURCE=BUNDLED
> export ORC_SOURCE=BUNDLED
> export PARQUET_BUILD_EXAMPLES=ON
> export PARQUET_BUILD_EXECUTABLES=ON
> export PYTHON=python
> export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
> ci/scripts/cpp_build.sh $(pwd) $(pwd) 
> {code}
>  
> Error (full logs are attached):
> {code:java}
> ...
> checking size of void *... 8
> checking size of int... 4
> checking size of long... 8
> checking size of long long... 8
> checking size of intmax_t... 8
> checking build system type... 
> -- stderr output is:
> Invalid configuration `arm64-apple-darwin20.0.0': machine `arm64-apple' not 
> recognized
> configure: error: /bin/sh build-aux/config.sub arm64-apple-darwin20.0.0 failed
> CMake Error at 
> /Users/voltrondata/arrow/cpp/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-configure-RELEASE.cmake:47
>  (message):
>   Stopping after outputting logs.
> [31/380] Performing configure step for 'orc_ep'
> ninja: build stopped: subcommand failed. {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-14289) [C++] Change Scanner::Head to return a RecordBatchReader

2022-09-27 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-14289:
---

Assignee: (was: Weston Pace)

> [C++] Change Scanner::Head to return a RecordBatchReader
> 
>
> Key: ARROW-14289
> URL: https://issues.apache.org/jira/browse/ARROW-14289
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, R
>Reporter: Neal Richardson
>Priority: Major
>
> Following ARROW-9731 and ARROW-13893. This would make it more natural to work 
> with ExecPlans that return a RecordBatchReader when you Run them. 
> Alternatively, we could move the business to RecordBatchReader::Head.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-13028) [C++] CSV add convert option to attempt 32bit number inferences

2022-09-27 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610149#comment-17610149
 ] 

Todd Farmer commented on ARROW-13028:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] CSV add convert option to attempt 32bit number inferences
> ---
>
> Key: ARROW-13028
> URL: https://issues.apache.org/jira/browse/ARROW-13028
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Nate Clark
>Assignee: Nate Clark
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When types are being inferred by CSV the numbers are always 64 bit. For large 
> data sets it could be better to use 32 bit types to save over all memory. To 
> do this it would be useful to add an option to ConvertOptions to try 32 bit 
> numbers before 64 bit. By default this option would be disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-15751) [Docs][Python] Restructure developers/python.rst

2022-09-26 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-15751:
---

Assignee: (was: Alenka Frim)

> [Docs][Python] Restructure developers/python.rst
> 
>
> Key: ARROW-15751
> URL: https://issues.apache.org/jira/browse/ARROW-15751
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Alenka Frim
>Priority: Minor
>
> Restructure _developers/python.rst_ page to use sphinx tabs and panels in 
> order to make it more structured and clear for the user/contributor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15751) [Docs][Python] Restructure developers/python.rst

2022-09-26 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609591#comment-17609591
 ] 

Todd Farmer commented on ARROW-15751:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [Docs][Python] Restructure developers/python.rst
> 
>
> Key: ARROW-15751
> URL: https://issues.apache.org/jira/browse/ARROW-15751
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Python
>Reporter: Alenka Frim
>Assignee: Alenka Frim
>Priority: Minor
>
> Restructure _developers/python.rst_ page to use sphinx tabs and panels in 
> order to make it more structured and clear for the user/contributor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16915) [C++] Unify approaches to attach schemas on record batches exiting Acero

2022-09-26 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609590#comment-17609590
 ] 

Todd Farmer commented on ARROW-16915:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Unify approaches to attach schemas on record batches exiting Acero
> 
>
> Key: ARROW-16915
> URL: https://issues.apache.org/jira/browse/ARROW-16915
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> Internally, Acero uses ExecBatch everywhere, without schemas.  Originally, 
> the various exit nodes would simply attach a boring schema based on the 
> output data types and an inference of field names.
> However, as part of Substrait integration and other improvements the various 
> sink nodes are being amended to support:
>  * Custom field names
>  * Custom metadata
> However, the current implementation is somewhat inconsistent.
> SinkNode:
>  - Does not support custom field names or metadata
> ConsumingSinkNode:
>  - Supports custom names but not custom metadata
> WriteNode
>  - Supports custom metadata but not custom names
> We should create a {{SinkNodeOptions}} base class that supports custom names 
> and custom metadata and we should have a single place with utility methods 
> for attaching a schema to an outgoing exec batch.  Then all of our sink nodes 
> should use this single tool for modifying outgoing batches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16915) [C++] Unify approaches to attach schemas on record batches exiting Acero

2022-09-26 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16915:
---

Assignee: (was: Vibhatha Lakmal Abeykoon)

> [C++] Unify approaches to attach schemas on record batches exiting Acero
> 
>
> Key: ARROW-16915
> URL: https://issues.apache.org/jira/browse/ARROW-16915
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Priority: Major
>
> Internally, Acero uses ExecBatch everywhere, without schemas.  Originally, 
> the various exit nodes would simply attach a boring schema based on the 
> output data types and an inference of field names.
> However, as part of Substrait integration and other improvements the various 
> sink nodes are being amended to support:
>  * Custom field names
>  * Custom metadata
> However, the current implementation is somewhat inconsistent.
> SinkNode:
>  - Does not support custom field names or metadata
> ConsumingSinkNode:
>  - Supports custom names but not custom metadata
> WriteNode
>  - Supports custom metadata but not custom names
> We should create a {{SinkNodeOptions}} base class that supports custom names 
> and custom metadata and we should have a single place with utility methods 
> for attaching a schema to an outgoing exec batch.  Then all of our sink nodes 
> should use this single tool for modifying outgoing batches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-14202) [C++] A more RAM-efficient top-k sink node

2022-09-23 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-14202:
---

Assignee: (was: Ariana Villegas)

> [C++] A more RAM-efficient top-k sink node
> --
>
> Key: ARROW-14202
> URL: https://issues.apache.org/jira/browse/ARROW-14202
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 7.0.0
>Reporter: Alexander Ocsa
>Priority: Major
>  Labels: query-engine
>
> Mentioned here:
> https://github.com/apache/arrow/pull/11274#pullrequestreview-768267959
> For example, a top-k implementation could periodically (when batches_ has 
> some configurable # of rows) run through and discard data. The way it is 
> written now it would still require me to buffer the entire dataset in memory 
> (and/or spillover).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-14202) [C++] A more RAM-efficient top-k sink node

2022-09-23 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608846#comment-17608846
 ] 

Todd Farmer commented on ARROW-14202:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] A more RAM-efficient top-k sink node
> --
>
> Key: ARROW-14202
> URL: https://issues.apache.org/jira/browse/ARROW-14202
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 7.0.0
>Reporter: Alexander Ocsa
>Assignee: Ariana Villegas
>Priority: Major
>  Labels: query-engine
>
> Mentioned here:
> https://github.com/apache/arrow/pull/11274#pullrequestreview-768267959
> For example, a top-k implementation could periodically (when batches_ has 
> some configurable # of rows) run through and discard data. The way it is 
> written now it would still require me to buffer the entire dataset in memory 
> (and/or spillover).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14077) Compute IR source consumer

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14077.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> Compute IR source consumer
> --
>
> Key: ARROW-14077
> URL: https://issues.apache.org/jira/browse/ARROW-14077
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> This task tracks the implementation of the source IR consumer in Arrow C++.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14144) [C++] Compile compute IR flatbuffers in CI

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14144.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [C++] Compile compute IR flatbuffers in CI
> --
>
> Key: ARROW-14144
> URL: https://issues.apache.org/jira/browse/ARROW-14144
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> We should compile the compute IR flatbuffers in CI as a sanity check for PRs 
> that change the compute IR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14394) [IR] API for providing a catalog to Arrow compute

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14394.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [IR] API for providing a catalog to Arrow compute
> -
>
> Key: ARROW-14394
> URL: https://issues.apache.org/jira/browse/ARROW-14394
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Compute IR, Python
>Reporter: Phillip Cloud
>Priority: Major
>
> Compute IR producers and consumers need a way to consume metadata from some 
> kind of catalog to get answers to questions like "What tables can I query?", 
> "What is the schema of table {{X}}?", and others.
> This JIRA is for tracking the work of creating this interface.
> [~bkietz]
> [~kszucs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14573) [IR] Add check in CI for substrait codegen dev workflow

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14573.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [IR] Add check in CI for substrait codegen dev workflow
> ---
>
> Key: ARROW-14573
> URL: https://issues.apache.org/jira/browse/ARROW-14573
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> Generated code should be check for staleness in CI



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14079) Compute IR filter consumer

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14079.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> Compute IR filter consumer
> --
>
> Key: ARROW-14079
> URL: https://issues.apache.org/jira/browse/ARROW-14079
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> This task is to track the filter node implementation for compute IR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14080) Compute IR Aggregate consumer

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14080.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> Compute IR Aggregate consumer
> -
>
> Key: ARROW-14080
> URL: https://issues.apache.org/jira/browse/ARROW-14080
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> This task is to track implementation of a compute IR consumer for aggregate 
> nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14572) [IR] Look into vendoring nanopb

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14572.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [IR] Look into vendoring nanopb
> ---
>
> Key: ARROW-14572
> URL: https://issues.apache.org/jira/browse/ARROW-14572
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> nanopb is a lighter weight alternative to libprotobuf that we can potentially 
> use for the IR C++ protobuf consumption



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14081) Expose API for Arrow C++ IR Consumer

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14081.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> Expose API for Arrow C++ IR Consumer
> 
>
> Key: ARROW-14081
> URL: https://issues.apache.org/jira/browse/ARROW-14081
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> To allow clients in other languages than C++ to send IR to the Arrow C++ 
> compute engine, we need to expose a public API that allows producers to send 
> IR to the consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-16633) [C++] Arrow compute IR consumer converts Decimal literals incorrectly

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-16633.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [C++] Arrow compute IR consumer converts Decimal literals incorrectly
> -
>
> Key: ARROW-16633
> URL: https://issues.apache.org/jira/browse/ARROW-16633
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Compute IR
>Reporter: Ben Kietzman
>Priority: Minor
>  Labels: good-first-issue, pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Promotion of non-minor PR https://github.com/apache/arrow/pull/13215
> Decimal literal conversion memcpys from a garbage address



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-16479) Why can't we concatenate two tables across both axis?

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-16479.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> Why can't we concatenate two tables across both axis? 
> --
>
> Key: ARROW-16479
> URL: https://issues.apache.org/jira/browse/ARROW-16479
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Compute IR, Parquet, Python
>Affects Versions: 7.0.0
> Environment: Linux
>Reporter: Mausam Kumar
>Priority: Major
> Fix For: 7.0.2
>
>
> Right now, pa.concat only concatenates table along axis=0. We already have 
> append function, can we not extend it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14571) [IR] Bring in substrait protobuf message definitions and generate code

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14571.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [IR] Bring in substrait protobuf message definitions and generate code
> --
>
> Key: ARROW-14571
> URL: https://issues.apache.org/jira/browse/ARROW-14571
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> We need to bring in protocol buffer message definitions from substrait and 
> use them to generate C++ code for use in the IR consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14399) [IR] Add mark, single, and delim join types for correlated subquery support

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14399.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [IR] Add mark, single, and delim join types for correlated subquery support
> ---
>
> Key: ARROW-14399
> URL: https://issues.apache.org/jira/browse/ARROW-14399
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> DuckDB transforms correlated subqueries into joins a la 
> https://cs.emis.de/LNI/Proceedings/Proceedings241/383.pdf and subsequently 
> introduces a few new join types: mark, single and delim join. This JIRA is to 
> track adding these to the compute IR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14078) Compute IR project consumer

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14078.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> Compute IR project consumer
> ---
>
> Key: ARROW-14078
> URL: https://issues.apache.org/jira/browse/ARROW-14078
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> This task is to track an IR consumer for project nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-15832) [C++] Revisit function categories in the context of the query engine

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-15832.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [C++] Revisit function categories in the context of the query engine
> 
>
> Key: ARROW-15832
> URL: https://issues.apache.org/jira/browse/ARROW-15832
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Compute IR
>Reporter: Eduardo Ponce
>Priority: Minor
>
> [This PR thread on whether a cumulative sum function should be treated as a 
> Scalar or Vector 
> function|https://github.com/apache/arrow/pull/12460#issuecomment-1057521694], 
> brought up an interesting discussion on the classes of functions currently 
> available in Arrow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14082) [Python] Expose Arrow C++ Consumer API to pyarrow

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14082.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [Python] Expose Arrow C++ Consumer API to pyarrow
> -
>
> Key: ARROW-14082
> URL: https://issues.apache.org/jira/browse/ARROW-14082
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Compute IR, Python
>Reporter: Phillip Cloud
>Priority: Major
>
> Once we have ARROW-14081, we need to add pyarrow bindings to allow use from 
> Python.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-14574) [IR] Look into JSON gen for protos similar to flatc

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer closed ARROW-14574.
---
Resolution: Won't Do

The Compute IR component has been abandoned in favor of Substrait. Please see 
[here|https://lists.apache.org/thread/o9vf5fzkkpjzxqsw2bd5b6grh9ov8ys2] for 
details on engaging with the Substrait project.

> [IR] Look into JSON gen for protos similar to flatc
> ---
>
> Key: ARROW-14574
> URL: https://issues.apache.org/jira/browse/ARROW-14574
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Compute IR
>Reporter: Phillip Cloud
>Priority: Major
>
> {{flatc}} has utilities for converting a flatbuffer binary message blob to 
> JSON, which is useful for testing and verification. We should figure out how 
> to do this with protocol buffers and see what existing tools are available 
> for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16463) [C++] Add support for non-local filesystem URIs in the Substrait consumer

2022-09-21 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16463:
---

Assignee: (was: Richard Tia)

> [C++] Add support for non-local filesystem URIs in the Substrait consumer
> -
>
> Key: ARROW-16463
> URL: https://issues.apache.org/jira/browse/ARROW-16463
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Priority: Major
>  Labels: substrait
>
> Currently the Substrait consumer only accepts URIs that use the {{file}} 
> scheme.  We should add support for URI schemes that we support ({{s3}}, 
> {{gcfs}}) similar to the way pyarrow can create filesystems from URIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16463) [C++] Add support for non-local filesystem URIs in the Substrait consumer

2022-09-21 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607873#comment-17607873
 ] 

Todd Farmer commented on ARROW-16463:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Add support for non-local filesystem URIs in the Substrait consumer
> -
>
> Key: ARROW-16463
> URL: https://issues.apache.org/jira/browse/ARROW-16463
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Richard Tia
>Priority: Major
>  Labels: substrait
>
> Currently the Substrait consumer only accepts URIs that use the {{file}} 
> scheme.  We should add support for URI schemes that we support ({{s3}}, 
> {{gcfs}}) similar to the way pyarrow can create filesystems from URIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16854) [C++] Add RoundTrip to Relations

2022-09-19 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16854:
---

Assignee: (was: Vibhatha Lakmal Abeykoon)

> [C++] Add RoundTrip to Relations
> 
>
> Key: ARROW-16854
> URL: https://issues.apache.org/jira/browse/ARROW-16854
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> In the effort solving https://issues.apache.org/jira/browse/ARROW-16496, the 
> tasks have been structured into a set of tasks. The focus of this task is to 
> provide the `ToProto` function for relations in Substrait. This task will 
> include a set of child tasks which include a set of relations added in each 
> child task. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16854) [C++] Add RoundTrip to Relations

2022-09-19 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606674#comment-17606674
 ] 

Todd Farmer commented on ARROW-16854:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Add RoundTrip to Relations
> 
>
> Key: ARROW-16854
> URL: https://issues.apache.org/jira/browse/ARROW-16854
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> In the effort solving https://issues.apache.org/jira/browse/ARROW-16496, the 
> tasks have been structured into a set of tasks. The focus of this task is to 
> provide the `ToProto` function for relations in Substrait. This task will 
> include a set of child tasks which include a set of relations added in each 
> child task. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16496) [C++] Add roundtrip support to plans + relations

2022-09-18 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606327#comment-17606327
 ] 

Todd Farmer commented on ARROW-16496:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Add roundtrip support to plans + relations
> 
>
> Key: ARROW-16496
> URL: https://issues.apache.org/jira/browse/ARROW-16496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>  Labels: substrait
>
> Currently we have a significant amount effort put into round tripping 
> expressions and types but we don't yet have support for relations and plans.  
> Doing so would make it possible to do things like save off Arrow exec plans 
> created by non-substrait means (for example, to save off a plan created by 
> pyarrow or R's current non-substrait dplyr implementation).
> It would also help with testing as it would enable more round-trip testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16496) [C++] Add roundtrip support to plans + relations

2022-09-18 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16496:
---

Assignee: (was: Vibhatha Lakmal Abeykoon)

> [C++] Add roundtrip support to plans + relations
> 
>
> Key: ARROW-16496
> URL: https://issues.apache.org/jira/browse/ARROW-16496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Priority: Major
>  Labels: substrait
>
> Currently we have a significant amount effort put into round tripping 
> expressions and types but we don't yet have support for relations and plans.  
> Doing so would make it possible to do things like save off Arrow exec plans 
> created by non-substrait means (for example, to save off a plan created by 
> pyarrow or R's current non-substrait dplyr implementation).
> It would also help with testing as it would enable more round-trip testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16211) [C++][Python] Unregister compute functions

2022-09-14 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16211:
---

Assignee: (was: Vibhatha Lakmal Abeykoon)

> [C++][Python] Unregister compute functions
> --
>
> Key: ARROW-16211
> URL: https://issues.apache.org/jira/browse/ARROW-16211
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> In general, when using UDFs, the user defines a function expecting a 
> particular outcome. When building the program, there needs to be a way to 
> update existing function kernels if it expands beyond what is planned before. 
> In such situations, there should be a way to remove the existing definition 
> and add a new definition. To enable this, the unregister functionality has to 
> be included. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16211) [C++][Python] Unregister compute functions

2022-09-14 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604851#comment-17604851
 ] 

Todd Farmer commented on ARROW-16211:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++][Python] Unregister compute functions
> --
>
> Key: ARROW-16211
> URL: https://issues.apache.org/jira/browse/ARROW-16211
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> In general, when using UDFs, the user defines a function expecting a 
> particular outcome. When building the program, there needs to be a way to 
> update existing function kernels if it expands beyond what is planned before. 
> In such situations, there should be a way to remove the existing definition 
> and add a new definition. To enable this, the unregister functionality has to 
> be included. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   >