Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread Thomas Passin via Python-list

On 1/21/2024 1:25 PM, marc nicole wrote:
It is part of a larger project aiming at processing data according to a 
given algorithm

Do you have any comments or any enhancing recommendations on the code?


I'm not knowledgeable enough about either pandas or numpy, I'm afraid, 
just very basic usage.  Someone else will probably pitch in.



Thanks.

Le dim. 21 janv. 2024 à 18:28, Thomas Passin via Python-list 
mailto:python-list@python.org>> a écrit :


On 1/21/2024 11:54 AM, marc nicole wrote:
 > Thanks for the reply,
 >
 > I think using a Pandas (or a Numpy) approach would optimize the
 > execution of the program.
 >
 > Target cells could be up to 10% the size of the dataset, a good
example
 > to start with would have from 10 to 100 values.

Thanks for the reformatted code.  It's much easier to read and think
about.

For say 100 points, it doesn't seem that "optimization" would be
much of
an issue.  On my laptop machine and Python 3.12, your example takes
around 5 seconds to run and print().  OTOH if you think you will go to
much larger datasets, certainly execution time could become a factor.

I would think that NumPy arrays and/or matrices would have good
potential.

Is this some kind of a cellular automaton, or an image filtering
process?

 > Let me know your thoughts, here's a reproducible example which I
formatted:
 >
 >
 >
 > from numpy import random
 > import pandas as pd
 > import numpy as np
 > import operator
 > import math
 > from collections import deque
 > from queue import *
 > from queue import Queue
 > from itertools import product
 >
 >
 > def select_target_values(dataframe, number_of_target_values):
 >      target_cells = []
 >      for _ in range(number_of_target_values):
 >          row_x = random.randint(0, len(dataframe.columns) - 1)
 >          col_y = random.randint(0, len(dataframe) - 1)
 >          target_cells.append((row_x, col_y))
 >      return target_cells
 >
 >
 > def select_contours(target_cells):
 >      contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
 >      contour_cells = []
 >      for target_cell in target_cells:
 >          # random contour count for each cell
 >          contour_cells_count = random.randint(1, 4)
 >          try:
 >              contour_cells.append(
 >                  [
 >                      tuple(
 >                          map(
 >                              lambda i, j: i + j,
 >                              (target_cell[0], target_cell[1]),
 >                              contour_coordinates[iteration_],
 >                          )
 >                      )
 >                      for iteration_ in range(contour_cells_count)
 >                  ]
 >              )
 >          except IndexError:
 >              continue
 >      return contour_cells
 >
 >
 > def create_zipf_distribution():
 >      zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))
 >
 >      zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)
 >
 >      return zipf_distribution_dataset
 >
 >
 > def apply_contours(target_cells, contour_cells):
 >      target_cells_with_contour = []
 >      # create one single list of cells
 >      for idx, target_cell in enumerate(target_cells):
 >          target_cell_with_contour = [target_cell]
 >          target_cell_with_contour.extend(contour_cells[idx])
 >          target_cells_with_contour.append(target_cell_with_contour)
 >      return target_cells_with_contour
 >
 >
 > def create_possible_datasets(dataframe, target_cells_with_contour):
 >      all_datasets_final = []
 >      dataframe_original = dataframe.copy()
 >
 >      list_tuples_idx_cells_all_datasets = list(
 >          filter(
 >              lambda x: x,
 >              [list(tuples) for tuples in
 > list(product(*target_cells_with_contour))],
 >          )
 >      )
 >      target_original_cells_coordinates = list(
 >          map(
 >              lambda x: x[0],
 >              [
 >                  target_and_contour_cell
 >                  for target_and_contour_cell in
target_cells_with_contour
 >              ],
 >          )
 >      )
 >      for dataset_index_values in list_tuples_idx_cells_all_datasets:
 >          all_datasets = []
 >          for idx_cell in range(len(dataset_index_values)):
 >              dataframe_cpy = dataframe.copy()
 >              dataframe_cpy.iat[
 >                  target_original_cells_coordinates[idx_cell][1],
 >                  target_original_cells_coordinates[idx_cell][0],
 >              ] = dataframe_original.iloc[
 >                  

Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread marc nicole via Python-list
It is part of a larger project aiming at processing data according to a
given algorithm
Do you have any comments or any enhancing recommendations on the code?

Thanks.

Le dim. 21 janv. 2024 à 18:28, Thomas Passin via Python-list <
python-list@python.org> a écrit :

> On 1/21/2024 11:54 AM, marc nicole wrote:
> > Thanks for the reply,
> >
> > I think using a Pandas (or a Numpy) approach would optimize the
> > execution of the program.
> >
> > Target cells could be up to 10% the size of the dataset, a good example
> > to start with would have from 10 to 100 values.
>
> Thanks for the reformatted code.  It's much easier to read and think about.
>
> For say 100 points, it doesn't seem that "optimization" would be much of
> an issue.  On my laptop machine and Python 3.12, your example takes
> around 5 seconds to run and print().  OTOH if you think you will go to
> much larger datasets, certainly execution time could become a factor.
>
> I would think that NumPy arrays and/or matrices would have good potential.
>
> Is this some kind of a cellular automaton, or an image filtering process?
>
> > Let me know your thoughts, here's a reproducible example which I
> formatted:
> >
> >
> >
> > from numpy import random
> > import pandas as pd
> > import numpy as np
> > import operator
> > import math
> > from collections import deque
> > from queue import *
> > from queue import Queue
> > from itertools import product
> >
> >
> > def select_target_values(dataframe, number_of_target_values):
> >  target_cells = []
> >  for _ in range(number_of_target_values):
> >  row_x = random.randint(0, len(dataframe.columns) - 1)
> >  col_y = random.randint(0, len(dataframe) - 1)
> >  target_cells.append((row_x, col_y))
> >  return target_cells
> >
> >
> > def select_contours(target_cells):
> >  contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
> >  contour_cells = []
> >  for target_cell in target_cells:
> >  # random contour count for each cell
> >  contour_cells_count = random.randint(1, 4)
> >  try:
> >  contour_cells.append(
> >  [
> >  tuple(
> >  map(
> >  lambda i, j: i + j,
> >  (target_cell[0], target_cell[1]),
> >  contour_coordinates[iteration_],
> >  )
> >  )
> >  for iteration_ in range(contour_cells_count)
> >  ]
> >  )
> >  except IndexError:
> >  continue
> >  return contour_cells
> >
> >
> > def create_zipf_distribution():
> >  zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))
> >
> >  zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)
> >
> >  return zipf_distribution_dataset
> >
> >
> > def apply_contours(target_cells, contour_cells):
> >  target_cells_with_contour = []
> >  # create one single list of cells
> >  for idx, target_cell in enumerate(target_cells):
> >  target_cell_with_contour = [target_cell]
> >  target_cell_with_contour.extend(contour_cells[idx])
> >  target_cells_with_contour.append(target_cell_with_contour)
> >  return target_cells_with_contour
> >
> >
> > def create_possible_datasets(dataframe, target_cells_with_contour):
> >  all_datasets_final = []
> >  dataframe_original = dataframe.copy()
> >
> >  list_tuples_idx_cells_all_datasets = list(
> >  filter(
> >  lambda x: x,
> >  [list(tuples) for tuples in
> > list(product(*target_cells_with_contour))],
> >  )
> >  )
> >  target_original_cells_coordinates = list(
> >  map(
> >  lambda x: x[0],
> >  [
> >  target_and_contour_cell
> >  for target_and_contour_cell in target_cells_with_contour
> >  ],
> >  )
> >  )
> >  for dataset_index_values in list_tuples_idx_cells_all_datasets:
> >  all_datasets = []
> >  for idx_cell in range(len(dataset_index_values)):
> >  dataframe_cpy = dataframe.copy()
> >  dataframe_cpy.iat[
> >  target_original_cells_coordinates[idx_cell][1],
> >  target_original_cells_coordinates[idx_cell][0],
> >  ] = dataframe_original.iloc[
> >  dataset_index_values[idx_cell][1],
> > dataset_index_values[idx_cell][0]
> >  ]
> >  all_datasets.append(dataframe_cpy)
> >  all_datasets_final.append(all_datasets)
> >  return all_datasets_final
> >
> >
> > def main():
> >  zipf_dataset = create_zipf_distribution()
> >
> >  target_cells = select_target_values(zipf_dataset, 5)
> >  print(target_cells)
> >  contour_cells = select_contours(target_cells)
> >  print(contour_cells)
> >  target_cells_with_contour = 

Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread Thomas Passin via Python-list

On 1/21/2024 11:54 AM, marc nicole wrote:

Thanks for the reply,

I think using a Pandas (or a Numpy) approach would optimize the 
execution of the program.


Target cells could be up to 10% the size of the dataset, a good example 
to start with would have from 10 to 100 values.


Thanks for the reformatted code.  It's much easier to read and think about.

For say 100 points, it doesn't seem that "optimization" would be much of 
an issue.  On my laptop machine and Python 3.12, your example takes 
around 5 seconds to run and print().  OTOH if you think you will go to 
much larger datasets, certainly execution time could become a factor.


I would think that NumPy arrays and/or matrices would have good potential.

Is this some kind of a cellular automaton, or an image filtering process?


Let me know your thoughts, here's a reproducible example which I formatted:



from numpy import random
import pandas as pd
import numpy as np
import operator
import math
from collections import deque
from queue import *
from queue import Queue
from itertools import product


def select_target_values(dataframe, number_of_target_values):
     target_cells = []
     for _ in range(number_of_target_values):
         row_x = random.randint(0, len(dataframe.columns) - 1)
         col_y = random.randint(0, len(dataframe) - 1)
         target_cells.append((row_x, col_y))
     return target_cells


def select_contours(target_cells):
     contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
     contour_cells = []
     for target_cell in target_cells:
         # random contour count for each cell
         contour_cells_count = random.randint(1, 4)
         try:
             contour_cells.append(
                 [
                     tuple(
                         map(
                             lambda i, j: i + j,
                             (target_cell[0], target_cell[1]),
                             contour_coordinates[iteration_],
                         )
                     )
                     for iteration_ in range(contour_cells_count)
                 ]
             )
         except IndexError:
             continue
     return contour_cells


def create_zipf_distribution():
     zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))

     zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)

     return zipf_distribution_dataset


def apply_contours(target_cells, contour_cells):
     target_cells_with_contour = []
     # create one single list of cells
     for idx, target_cell in enumerate(target_cells):
         target_cell_with_contour = [target_cell]
         target_cell_with_contour.extend(contour_cells[idx])
         target_cells_with_contour.append(target_cell_with_contour)
     return target_cells_with_contour


def create_possible_datasets(dataframe, target_cells_with_contour):
     all_datasets_final = []
     dataframe_original = dataframe.copy()

     list_tuples_idx_cells_all_datasets = list(
         filter(
             lambda x: x,
             [list(tuples) for tuples in 
list(product(*target_cells_with_contour))],

         )
     )
     target_original_cells_coordinates = list(
         map(
             lambda x: x[0],
             [
                 target_and_contour_cell
                 for target_and_contour_cell in target_cells_with_contour
             ],
         )
     )
     for dataset_index_values in list_tuples_idx_cells_all_datasets:
         all_datasets = []
         for idx_cell in range(len(dataset_index_values)):
             dataframe_cpy = dataframe.copy()
             dataframe_cpy.iat[
                 target_original_cells_coordinates[idx_cell][1],
                 target_original_cells_coordinates[idx_cell][0],
             ] = dataframe_original.iloc[
                 dataset_index_values[idx_cell][1], 
dataset_index_values[idx_cell][0]

             ]
             all_datasets.append(dataframe_cpy)
         all_datasets_final.append(all_datasets)
     return all_datasets_final


def main():
     zipf_dataset = create_zipf_distribution()

     target_cells = select_target_values(zipf_dataset, 5)
     print(target_cells)
     contour_cells = select_contours(target_cells)
     print(contour_cells)
     target_cells_with_contour = apply_contours(target_cells, contour_cells)
     datasets = create_possible_datasets(zipf_dataset, 
target_cells_with_contour)

     print(datasets)


main()

Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list 
mailto:python-list@python.org>> a écrit :


On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:
 > Hello,
 >
 > I have an initial dataframe with a random list of target cells
(each cell
 > being identified with a couple (x,y)).
 > I want to yield four different dataframes each containing the
value of one
 > of the contour (surrounding) cells of each specified target cell.
 >
 > the surrounding cells to consider for a specific target cell are
: (x-1,y),

Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread marc nicole via Python-list
Thanks for the reply,

I think using a Pandas (or a Numpy) approach would optimize the execution
of the program.

Target cells could be up to 10% the size of the dataset, a good example to
start with would have from 10 to 100 values.

Let me know your thoughts, here's a reproducible example which I formatted:



from numpy import random
import pandas as pd
import numpy as np
import operator
import math
from collections import deque
from queue import *
from queue import Queue
from itertools import product


def select_target_values(dataframe, number_of_target_values):
target_cells = []
for _ in range(number_of_target_values):
row_x = random.randint(0, len(dataframe.columns) - 1)
col_y = random.randint(0, len(dataframe) - 1)
target_cells.append((row_x, col_y))
return target_cells


def select_contours(target_cells):
contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
contour_cells = []
for target_cell in target_cells:
# random contour count for each cell
contour_cells_count = random.randint(1, 4)
try:
contour_cells.append(
[
tuple(
map(
lambda i, j: i + j,
(target_cell[0], target_cell[1]),
contour_coordinates[iteration_],
)
)
for iteration_ in range(contour_cells_count)
]
)
except IndexError:
continue
return contour_cells


def create_zipf_distribution():
zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))

zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)

return zipf_distribution_dataset


def apply_contours(target_cells, contour_cells):
target_cells_with_contour = []
# create one single list of cells
for idx, target_cell in enumerate(target_cells):
target_cell_with_contour = [target_cell]
target_cell_with_contour.extend(contour_cells[idx])
target_cells_with_contour.append(target_cell_with_contour)
return target_cells_with_contour


def create_possible_datasets(dataframe, target_cells_with_contour):
all_datasets_final = []
dataframe_original = dataframe.copy()

list_tuples_idx_cells_all_datasets = list(
filter(
lambda x: x,
[list(tuples) for tuples in
list(product(*target_cells_with_contour))],
)
)
target_original_cells_coordinates = list(
map(
lambda x: x[0],
[
target_and_contour_cell
for target_and_contour_cell in target_cells_with_contour
],
)
)
for dataset_index_values in list_tuples_idx_cells_all_datasets:
all_datasets = []
for idx_cell in range(len(dataset_index_values)):
dataframe_cpy = dataframe.copy()
dataframe_cpy.iat[
target_original_cells_coordinates[idx_cell][1],
target_original_cells_coordinates[idx_cell][0],
] = dataframe_original.iloc[
dataset_index_values[idx_cell][1],
dataset_index_values[idx_cell][0]
]
all_datasets.append(dataframe_cpy)
all_datasets_final.append(all_datasets)
return all_datasets_final


def main():
zipf_dataset = create_zipf_distribution()

target_cells = select_target_values(zipf_dataset, 5)
print(target_cells)
contour_cells = select_contours(target_cells)
print(contour_cells)
target_cells_with_contour = apply_contours(target_cells, contour_cells)
datasets = create_possible_datasets(zipf_dataset,
target_cells_with_contour)
print(datasets)


main()

Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list <
python-list@python.org> a écrit :

> On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:
> > Hello,
> >
> > I have an initial dataframe with a random list of target cells (each cell
> > being identified with a couple (x,y)).
> > I want to yield four different dataframes each containing the value of
> one
> > of the contour (surrounding) cells of each specified target cell.
> >
> > the surrounding cells to consider for a specific target cell are :
> (x-1,y),
> > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 cells from
> > these and consider for replacement to the target cell.
> >
> > I want to do that through a pandas-specific approach without having to
> > define the contour cells separately and then apply the changes on the
> > dataframe
>
> 1. Why do you want a Pandas-specific approach?  Many people would rather
> keep code independent of special libraries if possible;
>
> 2. How big can these collections of target cells be, roughly speaking?
> The size could make a big difference in picking a design;
>
> 3. You really should work on formatting code for this list.  Your code
> below is very complex and would take a 

Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread Thomas Passin via Python-list

On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:

Hello,

I have an initial dataframe with a random list of target cells (each cell
being identified with a couple (x,y)).
I want to yield four different dataframes each containing the value of one
of the contour (surrounding) cells of each specified target cell.

the surrounding cells to consider for a specific target cell are : (x-1,y),
(x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 cells from
these and consider for replacement to the target cell.

I want to do that through a pandas-specific approach without having to
define the contour cells separately and then apply the changes on the
dataframe 


1. Why do you want a Pandas-specific approach?  Many people would rather 
keep code independent of special libraries if possible;


2. How big can these collections of target cells be, roughly speaking? 
The size could make a big difference in picking a design;


3. You really should work on formatting code for this list.  Your code 
below is very complex and would take a lot of work to reformat to the 
point where it is readable, especially with the nearly impenetrable 
arguments in some places.  Probably all that is needed is to replace all 
tabs by (say) three spaces, and to make sure you intentionally break 
lines well before they might get word-wrapped.  Here is one example I 
have reformatted (I hope I got this right):


list_tuples_idx_cells_all_datasets = list(filter(
   lambda x: utils_tuple_list_not_contain_nan(x),
   [list(tuples) for tuples in list(
 itertools.product(*target_cells_with_contour))
   ]))

4. As an aside, it doesn't look like you need to convert all those 
sequences and iterators to lists all over the place;




(but rather using an all in one approach):
for now I have written this example which I think is not Pandas specific:

[snip]

--
https://mail.python.org/mailman/listinfo/python-list


How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

2024-01-21 Thread marc nicole via Python-list
Hello,

I have an initial dataframe with a random list of target cells (each cell
being identified with a couple (x,y)).
I want to yield four different dataframes each containing the value of one
of the contour (surrounding) cells of each specified target cell.

the surrounding cells to consider for a specific target cell are : (x-1,y),
(x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 cells from
these and consider for replacement to the target cell.

I want to do that through a pandas-specific approach without having to
define the contour cells separately and then apply the changes on the
dataframe (but rather using an all in one approach):
for now I have written this example which I think is not Pandas specific:























































*def select_target_values(dataframe, number_of_target_values):
target_cells = []for _ in range(number_of_target_values):row_x
= random.randint(0, len(dataframe.columns) - 1)col_y =
random.randint(0, len(dataframe) - 1)target_cells.append((row_x,
col_y))return target_cellsdef select_contours(target_cells):
contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]contour_cells =
[]for target_cell in target_cells:# random contour count for
each cellcontour_cells_count = random.randint(1, 4)try:
contour_cells.append([tuple(map(lambda i, j: i + j,
(target_cell[0], target_cell[1]), contour_coordinates[iteration_]))
 for iteration_ in range(contour_cells_count)])except
IndexError:continuereturn contour_cellsdef
apply_contours(target_cells, contour_cells):target_cells_with_contour =
[]# create one single list of cellsfor idx, target_cell in
enumerate(target_cells):target_cell_with_contour = [target_cell]
target_cell_with_contour.extend(contour_cells[idx])
target_cells_with_contour.append(target_cell_with_contour)return
target_cells_with_contourdef create_possible_datasets(dataframe,
target_cells_with_contour):all_datasets_final = []
dataframe_original = dataframe.copy()#check for nans
list_tuples_idx_cells_all_datasets = list(filter(lambda x:
utils_tuple_list_not_contain_nan(x),
[list(tuples) for tuples in list(itertools.product(

*target_cells_with_contour))]))target_original_cells_coordinates =
list(map(lambda x: x[0],
[target_and_contour_cell for target_and_contour_cell in
 target_cells_with_contour]))for
dataset_index_values in list_tuples_idx_cells_all_datasets:
all_datasets = []for idx_cell in range(len(dataset_index_values)):
  dataframe_cpy = dataframe.copy()dataframe_cpy.iat[
target_original_cells_coordinates[idx_cell][1],
target_original_cells_coordinates[idx_cell][0]] =
dataframe_original.iloc[dataset_index_values[idx_cell][1],
dataset_index_values[idx_cell][0]]
all_datasets.append(dataframe_cpy)
all_datasets_final.append(all_datasets)return all_datasets_final*


If you have a better Pandas approach (unifying all these methods into one
that make use of dataframe methods only) please let me know.

thanks!
-- 
https://mail.python.org/mailman/listinfo/python-list